Google Alum Raises $500M to Compete With Nvidia

Bloomberg

Watch Google Alum Raises $500M to Compete With Nvidia - Bloomberg on Dailymotion

Transcript

00:00I think we should get into the specifics, but my goodness, $500 million is quite a large Series B.

00:05What do you need that level of capital for, and what does it reflect?

00:10Yeah, so I mean, I would say, very happy to be here, and thank you.

00:14One of the, really what it reflects is, on the one hand, very strong confidence from some of our lead

00:20investors in the product.

00:22This is from Jane Street and Situational Awareness have led our round.

00:28And very strong, on Jane Street's side, they're absolute technical experts.

00:32They understand the kind of product we're doing.

00:34And then Situational Awareness, that's Leopold Ashenbrenner's fund.

00:37He wrote the book on AGI, and he really understands where this whole space is going.

00:44What we are looking to do with this product and with this money, firstly, I would say the demand for

00:53LLM compute is just insatiable.

00:54All of the Frontier Labs are looking at where this space is going, and they're all concerned.

00:58I'm going to run out of silicon.

00:59I won't be able to serve all the demand I've got.

01:02So our goal overall has been to make the highest throughput per square millimetre of silicon that any product has.

01:09Right, this is about computational density.

01:11That's right.

01:12And so you guys are basically saying throughput in terms of flops per millimetre squared.

01:17This is something you can own.

01:19What was the breakthrough?

01:20What is it that you're so good at to achieve this?

01:23Yeah, so there's really a combination of two things.

01:26If you look at the products in the market previously, there's been the HBM-based family, which is NVIDIA, Google,

01:32Amazon, and then there's been the SRAM-based family.

01:35And you are SRAM.

01:36We are both, actually, uniquely.

01:38So it's kind of taking two good ideas and putting them together.

01:40It is possible to do both very high throughput, as you get from HBM, but also very low latency, as

01:47you get from SRAM, and do that in the same product.

01:49What it gives you is actually a product that is better than any other product in the market at throughput.

01:54This is exactly the flops per square millimetre that you described, while also matching some of the best, like the

01:59Cerebrus and the Grok, at latency.

02:01Let's talk about how quickly people can start deploying this, Rainer, because what your aim is to complete the final

02:07design this year.

02:09You hope to start manufacturing shipping even in 2027.

02:12Who do you need to partner with on that?

02:14How do you expect to be manufacturing here in the U.S. or abroad?

02:18Yeah, so, I mean, there's a few big parts of the supply chain, and this is common for us as

02:22well as many other semiconductor firms in this space.

02:26Really, you need logic wafers, memory wafers, which is HBM, and then you need rack build-outs.

02:32And so those are the big parts of our supply chain.

02:35TSMC is well-recognized as the best provider of logic wafers.

02:38And then the memory wafers, there's the big three, which are SK Hynix, Samsung, and Micron.

02:45And then there's a whole range of providers across the rack and manufacturing side.

02:51One of the big things that if you want to manufacture in very large volumes, we hear about these, you

02:56know, multi-gigawatt deals that are coming out.

03:00But these require, you know, billions of dollars of manufacturing and then actually setting up, like, hundreds of millions of

03:07dollars put into setting up supply chains in advance of delivering that.

03:10And so that's a big part of what we're excited to be able to do now.

03:14You left Google in 2022, and the goal was creating a better chip from scratch, Rainer.

03:19But have you been impressed by the leaps that TPU has taken?

03:23It seems to have impressed the market.

03:24What is it that you felt wasn't at Google for you that you now can build better?

03:30Yeah.

03:31So I think what is really required is if you want to absolutely nail the LLM workload, you have to

03:35be willing to break compatibility with previous chips.

03:38And so one of the strong guarantees you see all of the existing players providing is you can take a

03:44program that was written on my previous generation chip or my generation of chips five years ago, and it will

03:48run on my next generation chip.

03:50And so a lot of what that means is there are constraints on my chip has to support all of

03:55the previous number formats I supported.

03:57It has to support all of the different programming model.

03:59The way I communicate between cores on the chip, all of those have to be the same as each other.

04:04We felt that it would be necessary, like, if you really want to just absolutely nail this workload without regards

04:09for backwards compatibility or other workloads or anything like that,

04:12you need to – something of a blank slate design is required.

04:16For us, this means very large matrices, very low precision support, and then, in fact, an ability to split your

04:23very large systolic array into small pieces.

04:25You name-checked Grok, I think, with some admiration a second ago.

04:29I mean, like, when NVIDIA acquired Grok, Jensen Wang's view was that they were struggling to find their place in

04:35the world, in the market.

04:36And for what it's worth, Cerebrus, you name-checked as well, filed confidentially for IPO yesterday.

04:41Why might you succeed where Grok had to go to NVIDIA, and I guess they're working on something, you know,

04:49and the public markets, you know, are needed for capital going forward?

04:54Yeah.

04:55So I would say this is, like, historically, the market has been won by the HBM-based players.

05:01That's the Google, Amazon, NVIDIA, and not by Grok and Cerebrus.

05:06SRAM-only chips are very good for latency, but when you want to run very long-context models, you run

05:12out of memory capacity.

05:13SRAM is too small.

05:14It's fast but too small.

05:16Really, the hybrid of doing weights in SRAM so you get the low latency as well as having the HBM

05:22for very long-context support is –

05:24we believe that's what enables the low latency without all of the compromises that you would get otherwise.

05:29Rhino Pope, thank you so much for joining us today.

Category

Transcript

Comments

Recommended