Google to Release New Inference-Focused Chips

Bloomberg

Watch Google to Release New Inference-Focused Chips - Bloomberg on Dailymotion

Transcript

00:00What are they going to tell us in Las Vegas this week?

00:02What is the future of the TPU program?

00:05So what we're reporting is they're probably going to announce an inference chip, you know,

00:11for running AI models after they've been trained.

00:14Thus far, they've been doing training and inference in one chip.

00:17You know, we are expecting and reporting that they're probably going to announce something

00:20separate just for inference.

00:22Google chief scientist Jeff Dean told me in an interview, look, you know, the way inference

00:27demand is growing, quote, it now becomes sensible to specialized chips more for training and

00:31more for inference workloads, and they're looking at a bunch of things.

00:35Their chip chief, I mean, Vidot declined to tell me specifically whether they're going

00:39to announce that this week, but, you know, said we'd be hearing more soon.

00:42And this continues sort of a trend.

00:45You know, NVIDIA announced a fast inference chip from what they'd acquired from Grok.

00:50You mentioned before the Cerebris IPO.

00:52That's also at this point really a low latency fast inference play.

00:57Meanwhile, the play has been extraordinary with TPUs and the adoption by many would call

01:03even rivals, Meta likely wanting in on the Google made chips.

01:08So how are they broadening and where do you think they're going to be getting supply from

01:12more broadly?

01:12There's a lot of reporting in the market, for example, that maybe they turn to Marvell versus

01:16Broadcom.

01:17I know you can't comment on that directly, but how are they thinking about their own supply

01:20chain?

01:21Look, I think for them supply, excuse me, supply is a problem.

01:25I was talking to Google DeepMind CEO, Demis Hasevis, and he mentioned that as well.

01:29Look, you have Meta, which signed, they told us a multi-billion, multi-year deal to use TPUs.

01:35They're just getting their first big tranche of them.

01:38They're trying to figure out what they're going to do with them.

01:40Anthropic has a huge deal.

01:42Citadel is going to talk about how they're using TPUs at the Google Next conference this

01:46week.

01:47And what Demis was telling me was that, contrary to what Jensen Wong said last week on the Dora

01:53Kash Patel podcast, that it's really just Anthropic that wants them.

01:57They actually have a lot of people that are interested.

02:00They don't have enough supply.

02:03Demis was saying to me, look, what they end up doing is prioritizing the top-of-the-line

02:08Frontier Lab customers, because those are the customers who are most capable of taking

02:13advantage of what TPU has to offer.

02:15More broadly, I think the TPU play, and this is why it appeals to these big Frontier Labs,

02:21is that Google is the only maker of a large top-of-the-line Frontier model, one of the top

02:29models that also makes AI accelerator chips in large volume.

02:34OpenAI has said they will as well.

02:36I think that's why it's worth lingering for a minute on why the TPU is useful.

02:41So you explained really well that to this point, the TPU Tensor Processing Unit has basically

02:46been a general-purpose accelerator, training or inference.

02:50But when you put it side-by-side, as many people try to do against NVIDIA's latest GPU

02:55or other inference-specific chips, what is it about Google owning the architecture, about

03:01designing it for specifically the inference use case that makes it better?

03:04Is it money?

03:05Is it power?

03:06What do we need to know?

03:07What Google's argument is that it is the fact that they know what you need for training

03:13and running a top-of-the-line model.

03:16There are two things in the last couple of months that have really increased the interest

03:21in Google TPU.

03:22One is the Anthropic deal, which is a validation of the technology.

03:27The other was sort of the release of the latest version of Gemini, which was trained and is

03:31running its inference on Google TPU and the strong reviews that it's gotten.

03:36Google uses data and requests and information from their own AI model teams to figure out

03:43what they need to prioritize and, frankly, what they need to fix in the chip business.

03:48They work together to figure out that, for example, Google TPU wasn't doing, utilization

03:53was too low on those chips when you were using it for reinforcement learning.

03:58Demis was telling me they're using it to figure out how precise the chips have to be as opposed

04:03to where they can save money, and that's sort of a set of data that flows into the Google

04:07TPU design team that other chip makers don't necessarily have.

04:13NVIDIA does have a very solid model team, but among the big three, Anthropic, OpenAI,

04:20Google, four frontier models, Google is the only one making AI accelerator chips at volume

04:25right now.

Category

Transcript

Comments

Recommended