00:00Etched is emerging from stealth with an eye toward taking on chip heavyweights like NVIDIA
00:05as the industry shifts from training AI models to running them.
00:09Inference, the company's raised $800 million in funding out of the gate,
00:13including backing from the likes of Jane Street and TSMC-linked firm Venture Tech Alliance.
00:18Etched CEO Gavin Iberti joins us now.
00:22You have introduced a rack-scale system, very interesting split memory design, very focused on inference.
00:34We'll get into all the money you've raised and what you've been doing quietly for two years,
00:37but could we start on the system?
00:40What is Etched? What is it pitching?
00:42Well, Etched is building rack-scale inference systems.
00:44We're really excited today to go and unveil both our funding and two of our core technologies that make the
00:49tech work,
00:50what we call a low-voltage inference and cluster-scale memory.
00:54When you think about inference, you think about pre-fill and decode.
00:57Yes.
00:58That is reading in all the input data and producing out those decode tokens.
01:02We're just showing an image of the system on the screen, the rack-scale inference system.
01:08What is it we're looking at? Break it down for us.
01:11So right here, you see one of our racks.
01:13This thing comes pre-assembled with 32 of our chips in it for our first-gen product.
01:17And what you have is our cluster-scale memory tech linking those chips.
01:21And this technology allows for very low latency communication between chips.
01:25And that then allows one chip to read and use the HBM and SRAM memory of other chips in its
01:33system.
01:33You have a very interesting story.
01:35You've clearly been busy for a couple of years,
01:37but you've raised a significant amount of money from a very interesting list of backers.
01:42And just reflect on that for a minute.
01:44I mean, how have you been able to convince a Jane Street of the effectiveness of the technology?
01:50And why raise capital at that level?
01:53Well, I think if you look at our backers, they are extremely technical.
01:56To go ahead and take a bet like this on somebody like myself and my co-founders,
02:01you need to go ahead and be very first principles driven,
02:04understand the tech very, very deeply,
02:06and see why the market is going to be so big
02:09and why this technology is fundamentally better.
02:11And our backers like Jane Street and PetroTech Alliance have big chip teams,
02:16and they get it.
02:18We had reported, I think, that money was raised at around a $5 billion valuation.
02:23But what's astonishing is, you know, with respect,
02:26the brief history of the company, right?
02:28How long, when did you come up with the idea?
02:31What was that first couple of years of trying to get set up like?
02:34Well, it took a lot of work to get to this point.
02:37We're at a spot today where we have the RackScale inference systems running,
02:40and we have this tech proven out.
02:43And while it's not a low valuation,
02:47the fact that we have the tech to back it up, I think, makes it make sense.
02:49I'm really interested by the specifics of the system.
02:53On the program of late, just by way of example,
02:57you know, the pitch of Cerebrus is its insulation
03:00from what's happening in high bandwidth memory
03:02because it doesn't rely on high bandwidth memory.
03:06SRAM is what they have configured.
03:09How does that work for Etch and its system?
03:11Well, for us, we're able to go ahead
03:12and get more out of the same volume of HBM.
03:16And one of the ways we do it
03:17is a tech we call low-voltage inference.
03:19We want to be able to use the same HBM and the same SRAM
03:23to run way, way more tokens for more users.
03:26But the bottom of the Cerebrus typically is power.
03:29If you look at a GPU, it thermally throttles.
03:32You cannot fit more compute onto that same chip.
03:35So what we do is we lower the voltage a lot.
03:38We run out under half the voltage of typical NVIDIA GPUs.
03:42And as a result, we're able to get very large power savings.
03:45And that in turn means you can have more users on the same chip
03:49and get more mileage out of your same HBM bandwidth.
03:52If you think about the economic benefits
03:54for the customer and the user,
03:55then you would pitch this as being superior
03:58on essentially a dollar per token basis.
04:01Absolutely.
04:02We think about this as economies of scale.
04:04You can go have more infrastructure.
04:06It allows you to go ahead and get a much lower cost per user.
04:10And as the models get bigger,
04:12that cluster scale memory deck
04:13is going to give more and more of an advantage.
04:15The world we're in is that, you know,
04:18for some time we've been talking about
04:21better performing systems coming for NVIDIA architecture, right?
04:25NVIDIA still has a technical monopoly in the market.
04:30What does the pathway look like for you to scale?
04:32And is there anything you can say about sort of real world workloads
04:36that are being run on your systems,
04:38real world revenues that you're already able to point to as evidence
04:42that this will be out there in a meaningful way?
04:45Well, we are running some benchmarks
04:47and we are seeing best in the world performance.
04:49But what gets me really excited...
04:50Outside of the lab environment, though?
04:52We've had some customers test the system.
04:54What gets me really excited on this is production.
04:57That to make a big difference,
04:59you have to build a lot of these products.
05:01And we have a world-class platform and production team.
05:03And that explains the capital that you raised?
05:06It's a key part of it.
05:07The team is such a massive part of our story.
05:09Around half of our platform team is from NVIDIA.
05:12We're lucky to have some of the best,
05:14including our VP,
05:15who ran NVIDIA's DJX and HDX team.
Comments