00:00Why would NVIDIA want to put money into you?
00:02How are you helping the GPU story?
00:06Yeah, thanks for having me.
00:08Well, we're helping companies get access to this great open source models
00:14and we're building this purpose built inference cloud.
00:19So it's really you need to build a specialized infrastructure
00:23to do inference efficiently.
00:25And it really helps everyone access this great open source AI models really well.
00:32And I mean, that's why NVIDIA was excited to participate in our round.
00:36We were just hearing that Cerebrus,
00:39which also wants to speed up AI inference offerings,
00:43is going to be tapping the public markets.
00:45It sees itself as an NVIDIA competitor.
00:48How, Nicola, are you more of an adjacent to the NVIDIA story?
00:53How are you looking to help deploy GPUs?
00:55Why is your compute different?
01:00You know, we've tried a number of different inference accelerators,
01:04but we believe like NVIDIA's hardware is still the most efficient,
01:08the best to do inference at.
01:11And we're just doubling down on that kind of hardware platform.
01:16And we like working closely with teams at NVIDIA
01:20and just making inference more efficient.
01:23Lowering the price per token is what we really like to focus on.
01:30And so that's kind of our path.
01:33We, you know, there would be a lot of demand for inference down the line.
01:37Like, we think that 80% of the compute is going to go towards inference.
01:41But at the moment, you know,
01:44we've invested heavily into the NVIDIA hardware stack.
01:48What's interesting is you're already processing, what,
01:515 trillion tokens per week.
01:53How do you get the efficiencies?
01:55How do you drive down the cost per token?
02:00It's a lot of hard work.
02:01And we look at it through the whole stack of, like,
02:06where we build this kind of inference clusters,
02:09which data centers we go in, how we structure them,
02:13and then a lot on the software on top of it.
02:16We believe, like, one of the key things
02:18is to have very good caching of tokens.
02:21As these agents kind of interact with the AI models,
02:24they basically make many requests in a loop
02:28with roughly the same context.
02:30So, like, I think KB caches are really key
02:33to efficiency of inference.
02:37And, you know, it's, we've been doing this for about four years.
02:42We saw the demand for inference coming,
02:45and we really focused on, like,
02:48how do we build a purpose-built inference cloud?
02:51And I think, therefore, you are scaling out.
02:53You have, was it, you're operating out of eight data centers?
02:56I think it is already.
02:57With this new money, 100 million more than,
03:00where do you deploy that?
03:03We will use the capital from this fundraise
03:06to scale our platform.
03:08We will, you know, deploy more of the latest NVIDIA chips
03:13and then scale our inference platform across the U.S.
03:18But also, we're looking to expand in Europe and Asia
03:21later this year.
03:23Samsung, also part of the strategic investors.
03:26It's not just GPUs from NVIDIA one needs.
03:29You need an awful lot of the high-bandwidth memory.
03:31We know about memory costs soaring because of the AI demand.
03:34Is that why Samsung comes forward?
03:36How are you thinking about the supply chain headaches
03:39or not that you might face?
03:42Yeah, you know, in the last couple,
03:45since the beginning of the year,
03:46it's been like struggle to find some of the chips
03:49that we need for our inference clusters.
03:52Memory and disks have been an issue.
03:55And so it's important for us to have, you know,
03:59in our corner good investors that are helping us,
04:02including Supermicro and Samsung and NVIDIA
04:07to make sure we have the right supply
04:09and we don't struggle as much.
04:10But the chip shortages are pretty real.
04:12And I feel like, you know,
04:17inference needs like a million times more compute
04:20than like traditional computing.
04:23And so we would see more and more demand for chips,
04:28including CPUs, GPUs, memory.
Comments