Skip to playerSkip to main content
  • 8 hours ago
Transcript
00:00I think we should get into the specifics, but my goodness, $500 million is quite a large Series B.
00:05What do you need that level of capital for, and what does it reflect?
00:10Yeah, so I mean, I would say, very happy to be here, and thank you.
00:14One of the, really what it reflects is, on the one hand, very strong confidence from some of our lead
00:20investors in the product.
00:22This is from Jane Street and Situational Awareness have led our round.
00:28And very strong, on Jane Street's side, they're absolute technical experts.
00:32They understand the kind of product we're doing.
00:34And then Situational Awareness, that's Leopold Ashenbrenner's fund.
00:37He wrote the book on AGI, and he really understands where this whole space is going.
00:44What we are looking to do with this product and with this money, firstly, I would say the demand for
00:53LLM compute is just insatiable.
00:54All of the Frontier Labs are looking at where this space is going, and they're all concerned.
00:58I'm going to run out of silicon.
00:59I won't be able to serve all the demand I've got.
01:02So our goal overall has been to make the highest throughput per square millimetre of silicon that any product has.
01:09Right, this is about computational density.
01:11That's right.
01:11That's right.
01:12And so you guys are basically saying throughput in terms of flops per millimetre squared.
01:17This is something you can own.
01:19What was the breakthrough?
01:20What is it that you're so good at to achieve this?
01:23Yeah, so there's really a combination of two things.
01:26If you look at the products in the market previously, there's been the HBM-based family, which is NVIDIA, Google,
01:32Amazon, and then there's been the SRAM-based family.
01:35And you are SRAM.
01:36We are both, actually, uniquely.
01:38So it's kind of taking two good ideas and putting them together.
01:40It is possible to do both very high throughput, as you get from HBM, but also very low latency, as
01:47you get from SRAM, and do that in the same product.
01:49What it gives you is actually a product that is better than any other product in the market at throughput.
01:54This is exactly the flops per square millimetre that you described, while also matching some of the best, like the
01:59Cerebrus and the Grok, at latency.
02:01Let's talk about how quickly people can start deploying this, Rainer, because what your aim is to complete the final
02:07design this year.
02:09You hope to start manufacturing shipping even in 2027.
02:12Who do you need to partner with on that?
02:14How do you expect to be manufacturing here in the U.S. or abroad?
02:18Yeah, so, I mean, there's a few big parts of the supply chain, and this is common for us as
02:22well as many other semiconductor firms in this space.
02:26Really, you need logic wafers, memory wafers, which is HBM, and then you need rack build-outs.
02:32And so those are the big parts of our supply chain.
02:35TSMC is well-recognized as the best provider of logic wafers.
02:38And then the memory wafers, there's the big three, which are SK Hynix, Samsung, and Micron.
02:45And then there's a whole range of providers across the rack and manufacturing side.
02:51One of the big things that if you want to manufacture in very large volumes, we hear about these, you
02:56know, multi-gigawatt deals that are coming out.
03:00But these require, you know, billions of dollars of manufacturing and then actually setting up, like, hundreds of millions of
03:07dollars put into setting up supply chains in advance of delivering that.
03:10And so that's a big part of what we're excited to be able to do now.
03:14You left Google in 2022, and the goal was creating a better chip from scratch, Rainer.
03:19But have you been impressed by the leaps that TPU has taken?
03:23It seems to have impressed the market.
03:24What is it that you felt wasn't at Google for you that you now can build better?
03:30Yeah.
03:31So I think what is really required is if you want to absolutely nail the LLM workload, you have to
03:35be willing to break compatibility with previous chips.
03:38And so one of the strong guarantees you see all of the existing players providing is you can take a
03:44program that was written on my previous generation chip or my generation of chips five years ago, and it will
03:48run on my next generation chip.
03:50And so a lot of what that means is there are constraints on my chip has to support all of
03:55the previous number formats I supported.
03:57It has to support all of the different programming model.
03:59The way I communicate between cores on the chip, all of those have to be the same as each other.
04:04We felt that it would be necessary, like, if you really want to just absolutely nail this workload without regards
04:09for backwards compatibility or other workloads or anything like that,
04:12you need to – something of a blank slate design is required.
04:16For us, this means very large matrices, very low precision support, and then, in fact, an ability to split your
04:23very large systolic array into small pieces.
04:25You name-checked Grok, I think, with some admiration a second ago.
04:29I mean, like, when NVIDIA acquired Grok, Jensen Wang's view was that they were struggling to find their place in
04:35the world, in the market.
04:36And for what it's worth, Cerebrus, you name-checked as well, filed confidentially for IPO yesterday.
04:41Why might you succeed where Grok had to go to NVIDIA, and I guess they're working on something, you know,
04:49and the public markets, you know, are needed for capital going forward?
04:54Yeah.
04:55So I would say this is, like, historically, the market has been won by the HBM-based players.
05:01That's the Google, Amazon, NVIDIA, and not by Grok and Cerebrus.
05:06SRAM-only chips are very good for latency, but when you want to run very long-context models, you run
05:12out of memory capacity.
05:13SRAM is too small.
05:14It's fast but too small.
05:16Really, the hybrid of doing weights in SRAM so you get the low latency as well as having the HBM
05:22for very long-context support is –
05:24we believe that's what enables the low latency without all of the compromises that you would get otherwise.
05:29Rhino Pope, thank you so much for joining us today.
Comments

Recommended