Skip to playerSkip to main content
  • 4 days ago
As AI adoption accelerates worldwide, infrastructure efficiency has become a critical challenge. During this session, Khaled Maalej, CEO of VSORA, will present Jotunn8, the company’s purpose-built AI inference processor designed to deliver high-performance, energy-efficient AI computing at scale. Engineered specifically for inference workloads, Jotunn8 helps organizations optimize infrastructure efficiency, reduce total cost of ownership, and scale AI deployment through an architecture purpose-built for inference.

Category

🤖
Tech
Transcript
00:01Vizora, a French startup from Paris, founded by Khaled Malij.
00:08They built AI chips for inference.
00:12The chips is three times faster than the leading competitors with half the energy.
00:21Made in Europe, ladies and gentlemen,
00:24please welcome Khaled Malij, founder and CEO of Vizora.
00:37Good morning, ladies and gentlemen.
00:39Thank you for attending this presentation.
00:42My name is Khaled Malij.
00:43I'm co-founder and CEO of Vizora.
00:47Vizora, we are a fablasmic industrial company.
00:50We built chips for AI.
00:51The main focus is inference.
00:53We can speak a little bit why we are doing this and why the focus is really inference today.
00:58This is really, really important.
01:04So we are a company acting globally.
01:07I mean, in general, when you start a semiconductor company,
01:10you have to address a little bit the market in the different regions.
01:14We do have offices around the world.
01:16The headquarters is based here in Paris with different design centers spread in France.
01:20But we also have a presence in the different places in the world.
01:26The company has been founded with very experienced engineers who used to work together for a long time,
01:32knowing how to build silicon, how to manufacture, I mean, these silicons and how to sell them.
01:40We just got, actually, our first silicon bag from the fab.
01:44Actually, I'm going to show this later on.
01:47And today we are roughly 50 employees around the world.
01:50The number is growing.
01:51And by the way, we are hiring now.
01:53So if you are willing to join the company, you are more than welcome.
01:59So why inference is important.
02:02If you look to all the analysts, what they are saying today,
02:05they're going to tell you that in two or three years,
02:0780% of the data center processing power is going to be dedicated to inference.
02:13So this is where the big game is happening today.
02:16And this is actually the big challenge that our industry is facing these days.
02:23So the three constraints today are preventing this market from really growing up.
02:30The first one is the Casper token.
02:32And if you look, most of the companies today running inference,
02:36they are not really profitable.
02:38They are more getting a little bit subsidized by VCs around the world.
02:42And this will have to change in order really to ramp up and start the mass deployment of inference.
02:48So the first element is the Casper token.
02:51This is really important.
02:52The latency as well.
02:54I mean, probably people using today AI and the different applications,
02:58they're going to see that the size of the context and the latency on that has increased significantly.
03:04Latency does not go well with cost.
03:06So if you constrain the latency, the Casper token increases.
03:10And then the third point is power.
03:12Power is really becoming a bottleneck today.
03:15I think you heard about, I mean, in many articles that were published about the fact that we are a
03:20little bit limited by power availability to deploy data centers.
03:24This is real.
03:26And you see here the growth that AI is going to go through in the next few years.
03:30You see that the problem is not going to be really soft.
03:34It's going to get worse.
03:35So we have to address these three main points to solve and enable the mass deployment of AI.
03:44So the trick that is really important to look at is that most of the solution that we are using
03:50today, they were not really built for AI.
03:53Okay.
03:53And if you look to the details of the performance of this solution, you're going to see that in some
04:00situations, the utilization rate actually of the processing power is not really high.
04:06And this is plays really an important role when it comes to the Casper token.
04:13So Visora, what we did actually, we changed completely the data movement in the silicon in a way that actually
04:21we are using a stream approach here in a way that we can really occupy in a very good way
04:27all the processing power of the silicon.
04:30And this is important because if you look to the cost per token today, 80% of the cost is
04:36coming from the amortization of your investment.
04:39You buy a silicon, you're going to use it for three, five, four, three, four years, I would say.
04:45And during these four years, you're going to generate a certain number of tokens.
04:49This actually will give you 80% to 90% of the cost of the token.
04:54It's not really OPEX.
04:56It's really your CAPEX that are driving the Casper token.
04:58So if you are capable of improving the processing efficiency of your silicon and thus get actually much more tokens
05:07than the competition, then you can definitely reduce significantly the Casper token.
05:12This is actually where Visora plays.
05:14This is where we have invented this architecture that is today running on silicon.
05:19And you see here how the improvement can be significant in terms of efficiency.
05:28So the chip that we just released, this is actually the, actually probably I can't show you the chip because
05:35I have it here with me.
05:37This is the first European GPU actually.
05:41So the advantage, sorry, the advantage of these chips that are so huge that you can show them from the
05:47stage.
05:48So this chip here offers the same amount of memory that the big players in the US are offering today.
05:56And it offers much better efficiency actually.
05:59And this is what plays in reducing really the Casper token that I mentioned before.
06:06The solution is fully programmable.
06:08This is a technology that we have matured now for several years.
06:12And this is a technology that actually we just announced yesterday that it's going to be used and deployed by
06:19Scaleway before the end of this year in some of the data centers as well.
06:24So this technology will become available for the end users in a very short time.
06:32So we have also built a demo.
06:34This is showing a little bit how it works.
06:38It's like what you used to see on chat GPT.
06:42On this one, we can select as well the neural network that you want to run.
06:46This is Lamadu, Mistral or any other.
06:49And you can just put your query, I mean file your query here and start actually seeing what the answer.
06:55So for people who are capable of reading the answer, you're going to see that actually and who knows actually
07:01what the score was between PSG and Arsenal.
07:05This will let you know that finally sometimes you should not trust AI in all the cases.
07:09So the answer here was that Arsenal is going to win, but finally PSG was the winner.
07:14So this is a quick overview of what Visora we are doing.
07:21The positioning of the company today, we are the only European company providing this kind of technologies.
07:27This is going to be available again for the end users before the end of this year.
07:35And the goal of the company is really to address this big challenge for the AI deployment related to the
07:40Casper token, related to the latency and also the power consumption.
07:46Thank you very much for your attention. Thank you.
07:54We have two minutes or less. If we have questions, it's up to you.
08:01Ladies and gentlemen, sir.
08:10The question is, how do you compare to new chip designers like Cerebrows, these new guys who is doing inference
08:23as well?
08:23And then what is NVIDIA is doing in this area? Maybe you know a bit.
08:29OK, we probably need much more time than just two minutes to address this question.
08:34So our positioning in a way that we address the full spectrum of the models today for AI.
08:41If you look to solution like Cerebrows and others, they are limited in terms of bandwidth.
08:45They made different technical choices compared to what we have made on ourselves.
08:50In our case, we embed 288 gigabytes of memory, which is the maximum the industry can do today.
08:57And compared to the other players, we offer much better processing efficiency.
09:02And this is really where we differentiate ourselves.
09:05And the processing efficiency is the key, actually, to reduce the Casper token.
09:13Hello. So you said that you didn't need to use CUDA with these chips.
09:19Yes, I didn't mention that.
09:20And how do you do to replace CUDA? What do you use instead of CUDA?
09:25So, yeah. So the architecture itself, actually, is done in a way that we natively in the silicon,
09:35we handle matrices, tensors, and vectors. CUDA is much lower level than that.
09:40So we code at a much higher level, which is very convenient for engineers.
09:44Because in general, they keep really a functional view of their code.
09:49They never bother knowing where this component of that matrix is stored in the memory.
09:54They just handle indices.
09:55It's like what we do in MATLAB, basically.
09:58So it's much more simple to code and optimize.
10:07Thank you for the presentation. So I will talk about pricing, if possible.
10:12So for a data center who wants to start installing infrastructure about EI, he wants to try, okay?
10:20He has two clients, and he wants to do one rack. How much does it cost for him?
10:28So it's difficult to give you numbers here.
10:31Approximately. I don't know. To encourage them to do it.
10:35Yeah. So our goal is really to reduce significantly the Casper token.
10:40And in general, what we have, we can give you access to a small data center that we are building
10:46in our location.
10:47We are also going to have another data center being built with the Scaleway that you can have also access
10:52to.
10:53So that they can benchmark yourself and see a little bit how this works for your business case.
11:03Any other questions? Okay.
11:08No? Thank you very much for your attention.
11:10It's a problem. Thank you, Halel. Thank you so much.
11:12Thank you very much.
Comments

Recommended