Space, Quantum & Frontier Technologies - Live Demo with VSORA - video Dailymotion

Vivatech

As AI adoption accelerates worldwide, infrastructure efficiency has become a critical challenge. During this session, Khaled Maalej, CEO of VSORA, will present Jotunn8, the company’s purpose-built AI inference processor designed to deliver high-performance, energy-efficient AI computing at scale. Engineered specifically for inference workloads, Jotunn8 helps organizations optimize infrastructure efficiency, reduce total cost of ownership, and scale AI deployment through an architecture purpose-built for inference.

Transcript

00:01Vizora, a French startup from Paris, founded by Khaled Malij.

00:08They built AI chips for inference.

00:12The chips is three times faster than the leading competitors with half the energy.

00:21Made in Europe, ladies and gentlemen,

00:24please welcome Khaled Malij, founder and CEO of Vizora.

00:37Good morning, ladies and gentlemen.

00:39Thank you for attending this presentation.

00:42My name is Khaled Malij.

00:43I'm co-founder and CEO of Vizora.

00:47Vizora, we are a fablasmic industrial company.

00:50We built chips for AI.

00:51The main focus is inference.

00:53We can speak a little bit why we are doing this and why the focus is really inference today.

00:58This is really, really important.

01:04So we are a company acting globally.

01:07I mean, in general, when you start a semiconductor company,

01:10you have to address a little bit the market in the different regions.

01:14We do have offices around the world.

01:16The headquarters is based here in Paris with different design centers spread in France.

01:20But we also have a presence in the different places in the world.

01:26The company has been founded with very experienced engineers who used to work together for a long time,

01:32knowing how to build silicon, how to manufacture, I mean, these silicons and how to sell them.

01:40We just got, actually, our first silicon bag from the fab.

01:44Actually, I'm going to show this later on.

01:47And today we are roughly 50 employees around the world.

01:50The number is growing.

01:51And by the way, we are hiring now.

01:53So if you are willing to join the company, you are more than welcome.

01:59So why inference is important.

02:02If you look to all the analysts, what they are saying today,

02:05they're going to tell you that in two or three years,

02:0780% of the data center processing power is going to be dedicated to inference.

02:13So this is where the big game is happening today.

02:16And this is actually the big challenge that our industry is facing these days.

02:23So the three constraints today are preventing this market from really growing up.

02:30The first one is the Casper token.

02:32And if you look, most of the companies today running inference,

02:36they are not really profitable.

02:38They are more getting a little bit subsidized by VCs around the world.

02:42And this will have to change in order really to ramp up and start the mass deployment of inference.

02:48So the first element is the Casper token.

02:51This is really important.

02:52The latency as well.

02:54I mean, probably people using today AI and the different applications,

02:58they're going to see that the size of the context and the latency on that has increased significantly.

03:04Latency does not go well with cost.

03:06So if you constrain the latency, the Casper token increases.

03:10And then the third point is power.

03:12Power is really becoming a bottleneck today.

03:15I think you heard about, I mean, in many articles that were published about the fact that we are a

03:20little bit limited by power availability to deploy data centers.

03:24This is real.

03:26And you see here the growth that AI is going to go through in the next few years.

03:30You see that the problem is not going to be really soft.

03:34It's going to get worse.

03:35So we have to address these three main points to solve and enable the mass deployment of AI.

03:44So the trick that is really important to look at is that most of the solution that we are using

03:50today, they were not really built for AI.

03:53Okay.

03:53And if you look to the details of the performance of this solution, you're going to see that in some

04:00situations, the utilization rate actually of the processing power is not really high.

04:06And this is plays really an important role when it comes to the Casper token.

04:13So Visora, what we did actually, we changed completely the data movement in the silicon in a way that actually

04:21we are using a stream approach here in a way that we can really occupy in a very good way

04:27all the processing power of the silicon.

04:30And this is important because if you look to the cost per token today, 80% of the cost is

04:36coming from the amortization of your investment.

04:39You buy a silicon, you're going to use it for three, five, four, three, four years, I would say.

04:45And during these four years, you're going to generate a certain number of tokens.

04:49This actually will give you 80% to 90% of the cost of the token.

04:54It's not really OPEX.

04:56It's really your CAPEX that are driving the Casper token.

04:58So if you are capable of improving the processing efficiency of your silicon and thus get actually much more tokens

05:07than the competition, then you can definitely reduce significantly the Casper token.

05:12This is actually where Visora plays.

05:14This is where we have invented this architecture that is today running on silicon.

05:19And you see here how the improvement can be significant in terms of efficiency.

05:28So the chip that we just released, this is actually the, actually probably I can't show you the chip because

05:35I have it here with me.

05:37This is the first European GPU actually.

05:41So the advantage, sorry, the advantage of these chips that are so huge that you can show them from the

05:47stage.

05:48So this chip here offers the same amount of memory that the big players in the US are offering today.

05:56And it offers much better efficiency actually.

05:59And this is what plays in reducing really the Casper token that I mentioned before.

06:06The solution is fully programmable.

06:08This is a technology that we have matured now for several years.

06:12And this is a technology that actually we just announced yesterday that it's going to be used and deployed by

06:19Scaleway before the end of this year in some of the data centers as well.

06:24So this technology will become available for the end users in a very short time.

06:32So we have also built a demo.

06:34This is showing a little bit how it works.

06:38It's like what you used to see on chat GPT.

06:42On this one, we can select as well the neural network that you want to run.

06:46This is Lamadu, Mistral or any other.

06:49And you can just put your query, I mean file your query here and start actually seeing what the answer.

06:55So for people who are capable of reading the answer, you're going to see that actually and who knows actually

07:01what the score was between PSG and Arsenal.

07:05This will let you know that finally sometimes you should not trust AI in all the cases.

07:09So the answer here was that Arsenal is going to win, but finally PSG was the winner.

07:14So this is a quick overview of what Visora we are doing.

07:21The positioning of the company today, we are the only European company providing this kind of technologies.

07:27This is going to be available again for the end users before the end of this year.

07:35And the goal of the company is really to address this big challenge for the AI deployment related to the

07:40Casper token, related to the latency and also the power consumption.

07:46Thank you very much for your attention. Thank you.

07:54We have two minutes or less. If we have questions, it's up to you.

08:01Ladies and gentlemen, sir.

08:10The question is, how do you compare to new chip designers like Cerebrows, these new guys who is doing inference

08:23as well?

08:23And then what is NVIDIA is doing in this area? Maybe you know a bit.

08:29OK, we probably need much more time than just two minutes to address this question.

08:34So our positioning in a way that we address the full spectrum of the models today for AI.

08:41If you look to solution like Cerebrows and others, they are limited in terms of bandwidth.

08:45They made different technical choices compared to what we have made on ourselves.

08:50In our case, we embed 288 gigabytes of memory, which is the maximum the industry can do today.

08:57And compared to the other players, we offer much better processing efficiency.

09:02And this is really where we differentiate ourselves.

09:05And the processing efficiency is the key, actually, to reduce the Casper token.

09:13Hello. So you said that you didn't need to use CUDA with these chips.

09:19Yes, I didn't mention that.

09:20And how do you do to replace CUDA? What do you use instead of CUDA?

09:25So, yeah. So the architecture itself, actually, is done in a way that we natively in the silicon,

09:35we handle matrices, tensors, and vectors. CUDA is much lower level than that.

09:40So we code at a much higher level, which is very convenient for engineers.

09:44Because in general, they keep really a functional view of their code.

09:49They never bother knowing where this component of that matrix is stored in the memory.

09:54They just handle indices.

09:55It's like what we do in MATLAB, basically.

09:58So it's much more simple to code and optimize.

10:07Thank you for the presentation. So I will talk about pricing, if possible.

10:12So for a data center who wants to start installing infrastructure about EI, he wants to try, okay?

10:20He has two clients, and he wants to do one rack. How much does it cost for him?

10:28So it's difficult to give you numbers here.

10:31Approximately. I don't know. To encourage them to do it.

10:35Yeah. So our goal is really to reduce significantly the Casper token.

10:40And in general, what we have, we can give you access to a small data center that we are building

10:46in our location.

10:47We are also going to have another data center being built with the Scaleway that you can have also access

10:52to.

10:53So that they can benchmark yourself and see a little bit how this works for your business case.

11:03Any other questions? Okay.

11:08No? Thank you very much for your attention.

11:10It's a problem. Thank you, Halel. Thank you so much.

11:12Thank you very much.

Space, Quantum & Frontier Technologies - Live Demo with VSORA

Category

Transcript

Comments

Recommended