Nvidia Backs DeepInfra in $107 Million Raise

Bloomberg

Watch Nvidia Backs DeepInfra in $107 Million Raise - Bloomberg on Dailymotion

Transcript

00:00Why would NVIDIA want to put money into you?

00:02How are you helping the GPU story?

00:06Yeah, thanks for having me.

00:08Well, we're helping companies get access to this great open source models

00:14and we're building this purpose built inference cloud.

00:19So it's really you need to build a specialized infrastructure

00:23to do inference efficiently.

00:25And it really helps everyone access this great open source AI models really well.

00:32And I mean, that's why NVIDIA was excited to participate in our round.

00:36We were just hearing that Cerebrus,

00:39which also wants to speed up AI inference offerings,

00:43is going to be tapping the public markets.

00:45It sees itself as an NVIDIA competitor.

00:48How, Nicola, are you more of an adjacent to the NVIDIA story?

00:53How are you looking to help deploy GPUs?

00:55Why is your compute different?

01:00You know, we've tried a number of different inference accelerators,

01:04but we believe like NVIDIA's hardware is still the most efficient,

01:08the best to do inference at.

01:11And we're just doubling down on that kind of hardware platform.

01:16And we like working closely with teams at NVIDIA

01:20and just making inference more efficient.

01:23Lowering the price per token is what we really like to focus on.

01:30And so that's kind of our path.

01:33We, you know, there would be a lot of demand for inference down the line.

01:37Like, we think that 80% of the compute is going to go towards inference.

01:41But at the moment, you know,

01:44we've invested heavily into the NVIDIA hardware stack.

01:48What's interesting is you're already processing, what,

01:515 trillion tokens per week.

01:53How do you get the efficiencies?

01:55How do you drive down the cost per token?

02:00It's a lot of hard work.

02:01And we look at it through the whole stack of, like,

02:06where we build this kind of inference clusters,

02:09which data centers we go in, how we structure them,

02:13and then a lot on the software on top of it.

02:16We believe, like, one of the key things

02:18is to have very good caching of tokens.

02:21As these agents kind of interact with the AI models,

02:24they basically make many requests in a loop

02:28with roughly the same context.

02:30So, like, I think KB caches are really key

02:33to efficiency of inference.

02:37And, you know, it's, we've been doing this for about four years.

02:42We saw the demand for inference coming,

02:45and we really focused on, like,

02:48how do we build a purpose-built inference cloud?

02:51And I think, therefore, you are scaling out.

02:53You have, was it, you're operating out of eight data centers?

02:56I think it is already.

02:57With this new money, 100 million more than,

03:00where do you deploy that?

03:03We will use the capital from this fundraise

03:06to scale our platform.

03:08We will, you know, deploy more of the latest NVIDIA chips

03:13and then scale our inference platform across the U.S.

03:18But also, we're looking to expand in Europe and Asia

03:21later this year.

03:23Samsung, also part of the strategic investors.

03:26It's not just GPUs from NVIDIA one needs.

03:29You need an awful lot of the high-bandwidth memory.

03:31We know about memory costs soaring because of the AI demand.

03:34Is that why Samsung comes forward?

03:36How are you thinking about the supply chain headaches

03:39or not that you might face?

03:42Yeah, you know, in the last couple,

03:45since the beginning of the year,

03:46it's been like struggle to find some of the chips

03:49that we need for our inference clusters.

03:52Memory and disks have been an issue.

03:55And so it's important for us to have, you know,

03:59in our corner good investors that are helping us,

04:02including Supermicro and Samsung and NVIDIA

04:07to make sure we have the right supply

04:09and we don't struggle as much.

04:10But the chip shortages are pretty real.

04:12And I feel like, you know,

04:17inference needs like a million times more compute

04:20than like traditional computing.

04:23And so we would see more and more demand for chips,

04:28including CPUs, GPUs, memory.

Category

Transcript

Comments

Recommended