Beyond Language Models: Building AI that Understands the World

Vivatech

Large language models have transformed how people interact with AI, but are they enough to achieve truly intelligent systems? Yann LeCun has long argued that the next breakthroughs will require machines capable of understanding the physical world, reasoning about their environment, and learning with far less human supervision. In this conversation, he shares his vision for the future of AI, the limitations of today's approaches, and the scientific challenges that remain unsolved.

Transcript

00:10Listen to those cheers. You're a hero here, right?

00:22Is this the definition of a hometown crowd?

00:26It looks like it, but that's my first experience with it. I hope I'll actually do the thing that deserves

00:33the cheers.

00:34I think you already have.

00:35All right.

00:38As we heard, you are not only a pioneer of AI and winner of the prestigious Touring Award, NYU professor,

00:50neighborhood guy in New York, my hometown.

00:53But now you've left Meta and started a new company, Advanced Machine Intelligence Labs, which could mean anything, right?

01:05I think there's probably a thousand companies that could be named that.

01:09Something like that.

01:10But yours is different. And before we talk specifically about it, I think what this is about is somewhat of

01:19a reaction to what the current paradigm in AI is, these large language models, which you are not particularly thrilled

01:31about.

01:31You don't feel that they're all cracked up today, correct?

01:35I'm actually super thrilled about it. I mean, there's nothing wrong with LLMs. They're useful.

01:41A lot of people in this room are using them. I use them. I'm sure you do.

01:44I do use them, yeah. At the right.

01:47So nothing wrong with that. I mean, there's a lot of computer technology that is very useful that is simply

01:52not a path to human level intelligence.

01:54So what I'm, you know, kind of speaking against is the idea that somehow we're going to take LLMs and

02:01scale them up and we're going to, you know, reach human level intelligence with this.

02:05To some extent, in some areas, those things have already reached super human intelligence in, I don't know, language translation,

02:13some aspect of code generation, not everything, right?

02:16You know, it can write code, but designing a whole software system is another story.

02:22So those things are useful. They're powerful. They are super human in some domains, but they have limits.

02:29And the idea that somehow those limits are going to be expanded to kind of cover all the things that

02:36humans do is just false.

02:39Well, you've spent, you know, over a decade in the belly of the beast, you know, Meta, one of the

02:47companies that is pursuing, you know, super intelligence, and the dominant companies in AI seem fixated on this AGI idea.

02:57They're super smart people. How do they go wrong on this?

03:02Okay, so there's a term to designate people who believe that we're going to, you know, scale up LLMs and

03:08get to super intelligence.

03:10They're called LLM-pilled, right? They took the pill. And now they're hypnotized.

03:16And to some extent, a lot of the culture in Silicon Valley or in the AI industry is LLM-pilled.

03:22And the funny thing is that because all the companies are pushing each other's engineers and they're all working on

03:31the same thing,

03:32they can't really afford to sort of deviate from the mainstream because they run the risk of falling behind.

03:38And so that creates this monoculture. But I think it's a bit of a delusion.

03:45And, you know, I think there is a requirement for, you know, new paradigms to kind of go beyond the

03:51limitations of current systems.

03:52And there's many ways to get to this new paradigm, and perhaps building up LLM is one way, but I

03:58think it's kind of a slow way.

03:59So AmiLabs is really built around the idea of sort of leapfrogging.

04:05You know what? There's no word in French for leapfrogging, right?

04:09Which is crazy because we're frogs, right? But, so, you know, let's directly build the thing we want to build

04:18instead of sort of, you know, going in circles to try to reach that.

04:24And I think we'll, you know, lift the limitations which, you know, for me are completely obvious.

04:29Yeah. I mean, you know, so sitting here, did you feel, you know, for the past few years,

04:36wait a minute, I'm on this train that's not going in the right direction?

04:40No, because I was doing my own direction, I had my own train track.

04:45The project I was working on has been my obsession for a long time, at least 15 years,

04:51but more intensely over the last five, internally at Meta, we called it AMI, Advanced Machine Intelligence.

04:59Okay, that became the name of the company, but it was the name of the project internally at FAIR.

05:03This project had the support of the leadership of FAIR.

05:07The big secret is I was not the leader of FAIR, I was, you know, a researcher, essentially.

05:13Yeah.

05:14It was led by Joelle Pinot until she left last year.

05:18So, it was an important project for FAIR.

05:21It had the support of Mark Zuckerberg, of Andrew Bosworth, the CTO, Mark Schaeffer, the former CTO.

05:27So, the leadership was supporting it, but what happened last year is that the company chose to refocus its effort,

05:35including some of its research efforts, towards LLM and catching up with the rest of the industry.

05:41Still willing to support my project, but the general ambience was not as favourable,

05:47even though I had the support of the leadership of the company.

05:51And we realised also that our techniques were starting to work really well,

05:55that most of the applications were in industry,

05:59and those are domains that Meta is not very interested in.

06:03Meta is really about connecting people with each other.

06:05That's the only thing they care about, at least for now.

06:07And so, probably it was, you know, it made sense to just leave the company, try to accelerate the development,

06:17and, you know, build the things we can build with it,

06:20and I realised I could also raise enough money to keep this going.

06:25What was the conversation like when you, you know, decided to make that split with Mark Zuckerberg?

06:32Well, there was a series of conversations, like, you know, starting late 2024,

06:37when I, there was sort of a bit of a conundrum because the Gen AI organisation,

06:44which was working on LLAMA, was really focused on very short-term objectives.

06:49And to some extent, there was a bit of a distance and impedance mismatch with FAIR,

06:54which was really about fundamental research.

06:56And so I told Mark that we should probably have an organisation in between

06:59that sort of bridges the gap between the two things,

07:04and that would give the opportunity to FAIR to really work on ambitious, longer-term projects.

07:11One of those two recommendations actually occurred.

07:15So, Mark, in 2025, created this intermediate organisation, which is called TBD Lab.

07:22But then what happened to FAIR is that FAIR was actually pushed into contributing to TBD Lab.

07:29So, actually refocusing on short-term objectives.

07:32And that went against what I thought.

07:34There was also ideas about, you know, being less open and putting restrictions on publications,

07:39which I disliked.

07:41So, it was a combination of things, really.

07:43But there's no, you know, I'm not angry or bitter or anything or whatever.

07:48I have good relations with Mark and Buzz.

07:50Yeah.

07:51Another thing that happened at Meta is, you know, you were and are a very big supporter of open source.

07:57Yeah.

07:58And, you know, for a while Meta was known as, you know, the one big company that really put a

08:05stake in the ground for open source,

08:07that built its models that way.

08:09Right.

08:09And then now not so much.

08:11Right.

08:12Yeah.

08:13Yeah.

08:14I mean, I think there was long discussions after the release of LAMA 1, which was not really open.

08:23LAMA 2 was being worked on.

08:24And the question was, should LAMA 2 be open source?

08:27Internal discussions, you know, with Mark Zuckerberg and 40 people down from the leadership,

08:35every week debating that topic, those real questions there.

08:41At one point, so I was arguing vehemently for open sourcing LAMA 2.

08:47And so did Andrew Bosworth.

08:48I mean, it's not like I was the only one, but certainly fairly vocal on this.

08:52And at some point, Mark said, yeah, we're going to open source LAMA 2.

08:56Tell me how.

08:59And that was a watershed moment because when LAMA 2 was open sourced, a lot of people realized,

09:05oh, wow, we can build a whole industry around this, right?

09:09And a lot of startups really were made possible by the open sourcing of LAMA 2.

09:17And it's a bit sad.

09:19I think that, you know, now the ethos has changed.

09:22But, okay, it's competitive.

09:25Yeah.

09:25I mean, it was, you know, that at one point was kind of a beacon carrying the torch of open

09:32source.

09:33Now people look to China.

09:34Carrying the pie torch.

09:37Very good.

09:39That joke.

09:40Yeah, yeah.

09:41That's for the, you know, geeks in the audience.

09:46The torch has dropped.

09:49Now I guess it's the Chinese who people look to open source.

09:53Yeah.

09:54So a lot of people are, particularly people who are close to the US government, which for

10:01all of its characteristics is actually somewhat in favor of open source AI.

10:07And they see that American industry is not doing open source anymore and that the open

10:14source scene is basically entirely occupied at least at the frontier by Chinese companies.

10:21And what you see in the utilization of, you know, AI platforms, there's a huge increase

10:29in the use of Chinese models, which are pretty good.

10:33And they're cheap.

10:34And, you know, cheap to run.

10:36Right.

10:36They're free if they're open.

10:38So that creates kind of conflict.

10:42Yeah.

10:43And in the middle of this you have, you know, companies like Anthropic and a few others

10:49that are basically lobbying to essentially make open source illegal, right?

10:54Because they think AI is intrinsically dangerous.

10:58This is like one of the huge debates where I have a very strong opinion, as you probably

11:02know.

11:04We might talk about this a little bit.

11:06Yeah.

11:07I mean, I think we are in a debate similar to what was probably happening in the 15th century

11:13when the printing press was invented, where the Catholic Church didn't like it.

11:19The Ottoman Empire hated it.

11:23But, you know, in Europe, at least they couldn't stop it.

11:26Right.

11:27So the people were printing books and people were able to, like, read the Bibles by themselves.

11:33That caused the Protestant movement in Europe, 200 years of religious conflicts.

11:40Okay.

11:40But also people, like, learned to read because now they had something to read.

11:46And ideas about, you know, science and philosophy and democracy and things like that were disseminated.

11:52So that brought about the Enlightenment and the American and French revolutions.

11:58The, basically, destruction of the feudal system in Europe, right?

12:01So, I mean, those things have huge effects.

12:03And it's a cultural medium that disseminates knowledge.

12:08Now, what is AI today?

12:10AI today, LLMs, are a way to disseminate knowledge.

12:14They're not yet a way to generate new knowledge.

12:18They're really a way of kind of building repositories of all human knowledge

12:23and then giving access to people in a sort of interactive way.

12:27So it's just another step in, you know, making people smarter by disseminating knowledge.

12:32So if you're going to block this because you think it's dangerous, you are being a medieval obscurantist.

12:40And I think that's insane.

12:43Right.

12:45So you're saying, you know, I mean, to take it to what's happening with Anthropic right now, you feel that,

12:52you know, saying this model is too powerful, we can't have this.

12:57Right.

12:57That's wrong.

12:58And they're, you know, basically, you know, in the role of the Catholic Church calling Galileo a heretic?

13:06Yeah.

13:07So they want to preserve the dogma.

13:08Maybe they want to control what people can do with their tools.

13:12You know, if I buy a pen, you know, I don't want the company that builds the pen to tell

13:17me what I can write with it.

13:18And I'm not a writer.

13:19You are.

13:22So, you know, there's a limit to what a company should be able to do with or authorize others to

13:31do with their tools.

13:32But I think, you know, the dissemination of knowledge and culture, I think, is intrinsically good.

13:40There are dangers attached to it.

13:42You mitigate them.

13:44But otherwise, it's just control.

13:47And I think it comes also from, I mean, there's a discourse where, okay, you know, it's too dangerous to

13:52give AI to everyone.

13:55But we can do it.

13:56Like, we can use it.

13:57Because, you know, we're smarter than everyone, right?

13:59I mean, there's a big arrogance and superiority complex in the idea that only a few are capable of controlling

14:06AI and the unwashed masses should not have access to it.

14:12Well, you don't buy that it's dangerous.

14:16If we let this stuff, we put this out, people are going to use it and, you know, just destroy

14:23the whole security infrastructure and our...

14:27Well, that's quite a shrug.

14:29I don't think any computer security...

14:31I'm not a computer security expert, but I don't think any computer security expert really believes this.

14:37There are dangers, but you just, you know, you're just correct for them.

14:40I mean, it's just like, you know, when email first appeared, there were scams and there were all kinds of

14:45stuff.

14:45And, you know, and spam and everything.

14:48Right.

14:48And, you know, we found solutions to them.

14:49Okay.

14:50So, let's talk about...

14:52Yes.

14:53Okay.

14:55What's the, you know, the basis of it?

14:58You're going to create these world models.

15:01And what's the advantage of a world model pursuing that we're not getting with LLMs?

15:07Okay.

15:07Two things.

15:08First thing is, if you want to build an agentic system, everybody has been talking about agentic system, right?

15:13So, an agentic system is a system that produces actions, either in the real world, the physical world, if it's

15:19a robot, or in the digital world.

15:22In my opinion, you cannot build a reliable agentic system without this system having the ability to anticipate the outcome

15:33and the consequences of its own actions.

15:37We have this.

15:38Most of us.

15:40Maybe some politicians don't.

15:42But we are certainly capable of anticipating the outcome resulting from our actions, and that's what allows us to plan

15:50a sequence of actions to accomplish a task.

15:54And so, that's what a world model is.

15:56Given the state of the world at time t, given an action that you imagine taking, can you predict the

16:00state of the world at time t plus one?

16:02Where one means, you know, ten milliseconds, a second, a minute, an hour, or ten years.

16:07Yeah.

16:08That's what a world model is.

16:10And so, that's the first thing.

16:13Now, people say, well, I can do this with an LLM.

16:16I can give a text description of the world, and then a text description of an action, and then a

16:22text description of the next state of the world, and then train a system to just predict the next description

16:27from the first two, right?

16:28LLM, in principle, could do this.

16:30It can do this as long as you can describe the world in terms of sequences of discrete symbols.

16:38And the reason for this is predicting discrete symbols is difficult.

16:45You can never predict which word will follow a sequence of words.

16:48But you can predict a probability distribution over all the possible words in your dictionary.

16:52And then by autoregressive prediction, you imply a distribution over sequences, right?

16:57That's how LLMs work.

16:59You simply cannot do this with real data.

17:02So, if you have data from sensors that are continuous, high dimensional, and noisy, you simply cannot predict what's going

17:10to happen.

17:11The example I use all the time, which is particularly appropriate in this room, is if I take a video

17:15of this room, and I turn the camera slowly, I stop the video here, and I ask the system to

17:22predict what's going to happen next in the video,

17:24you can't predict there's going to be more people sitting in the chair, you can't predict what the size of

17:28the room is, and there's no way you can predict what all of you look like.

17:32Absolutely no way.

17:33So, if you train a system, a generative architecture like an LLM, to predict in every detail at the pixel

17:40level, you know, the continuation of the video, it will fail.

17:45It's just not possible.

17:46And in fact, it's a mathematically intractable problem because we don't know how to represent distributions over an infinite number

17:54of plausible outcomes.

17:55And so, the idea that we're building AMI Labs on is the idea that the system should find an abstract

18:05representation of the observations and make predictions in that abstract representation space without attempting to reconstruct all the details which

18:15really are not predictable.

18:17It's very fundamental.

18:18How hard is that to do?

18:19How long a project is that before you have something, you know, that's, you know, you say you've been working

18:25on it at Meta, but how close are you to actually saying, okay, here it is, you know, folks, let's

18:31do it?

18:31Well, I've been working on the idea of using self-supervised running by video prediction for about 15 years, mostly

18:37failing for the first 10, because I was using generative models trying to predict at the pixel level.

18:42And then five years ago, we came up with a number of ideas, and also that was prompted by some

18:48results in image recognition, where the best architectures that are trained by self-supervised learning to learn image representations do

19:00not attempt to reconstruct.

19:01They're called joint embedding architectures.

19:03And so it was pretty clear to me, it was a clear message that architectures that are trained to reconstruct

19:10don't work, and the ones that are trained to just learn a good representation that has some interesting property work

19:14better.

19:15They say, well, you should apply this to video too.

19:17And so that's what happened about five years ago.

19:19There's a difficulty, a technical difficulty there, which is to prevent the system from collapsing.

19:26So when you tell the system, okay, here is a piece of video, find a good encoding for it, and

19:31then here is the following piece of video, encode it as well, and now predict the representation of that second

19:37chunk of video from the first one.

19:39If you train all the thing together without being careful, you just train it to minimize prediction error, the system

19:45will just ignore the inputs, produce constant representations, and now the prediction problem is trivial, but the thing doesn't do

19:51anything for you.

19:53So how do you prevent this collapse?

19:56And about, you know, four or five years ago, we discovered a series of techniques, basically kind of a new

20:00way of doing this.

20:02One of them is called SIGREG, and another one is called VCREG.

20:05They're basically based on the idea of maximizing some estimation of information quantity, and those work really well.

20:12And so because they work well, we think we can scale them up, we can apply them to a lot

20:17of different problems.

20:18This architecture, by the way, is called JEPA, Joint Embedding Predictive Architecture, and it's not generative.

20:24So I tell people in AI, you should stop doing generative AI or generative models.

20:31And, of course, it doesn't make me very popular.

20:35So you cracked the problem.

20:38Yeah, and so with this kind of technique, we can train the system to learn an abstract representation of observations

20:47that are complicated,

20:49train it to produce a world model, which is, again, state of the world, an abstract representation of an observation,

20:57and then from the state of the world and an action, predict the next state of the world,

21:00and then use this to plan by optimization, find a sequence of actions that will accomplish a task.

21:09And we're working on systems of this type that are hierarchical.

21:12I'm not going to go into the details, but we have good hopes that this will really be a complete

21:19change of the blueprints of intelligent systems

21:23that will perhaps take us to considerably more intelligent systems that can anticipate the outcome of their own actions,

21:32can plan, can reason, and perhaps all the way to human intelligence.

21:37But that's going to take a while. It's not going to happen next year.

21:39We're not going to have a country of geniuses in a data center next year, okay?

21:44You'll have different kinds of geniuses.

21:47Yeah, I mean, not narrow geniuses, you know.

21:49I mean, there's two areas where LLMs are geniuses.

21:52Yeah.

21:54It's mathematics and code.

21:56And those are two domains where the mere manipulation of symbols is actually the substrate of reasoning.

22:03And it's not a coincidence.

22:05LLMs are good at manipulating symbols.

22:07And so they work well for math and code.

22:10You couldn't help taking that swipe at Anthropoc.

22:13Dario Amadei very famously talks about it, you know, like a data center for a nation full of geniuses in

22:20the data center there.

22:22There's something else that you're involved in called Project Tapestry.

22:26A big theme, I think, of this conference is sovereignty, you know, and, you know, the idea of that Europe

22:33shouldn't be a handmaiden to the U.S.

22:35And AI.

22:37Tell me just a little bit about how Project Tapestry, what it is and how that would address that.

22:43So I had this idea for a number of years.

22:45I tried to push it inside Meta with very limited success.

22:48I've been trying to advertise it in various forums and governments.

22:51And I've been talking to a bunch of governments around the world.

22:54They all want AI sovereignty, right?

22:57And I think they're right.

22:58It's very important because pretty soon all of our information diet will be mediated by AI assistants.

23:07I'm actually wearing around on my nose right now, the Meta AI.

23:09You know, these are the smart glasses.

23:10I can take a picture of you guys.

23:12All right.

23:13Okay.

23:14Smile.

23:15All right.

23:17Okay.

23:18Nice.

23:19Cool.

23:20You can recognize the face of every single person in there with that, right?

23:25That would have been possible when Meta was doing face recognition, but it stopped doing

23:29it in 2020 or 2022, whatever.

23:33So the thing is, all of our information diet is going to be mediated by AI assistants.

23:41If those AI assistants are built by a handful of companies, proprietary systems for a handful

23:47of companies on the west coast of the U.S. or China, culture is in big trouble.

23:54Democracy is in big trouble, right?

23:56How do we get diverse sources of information?

23:59All of the systems have biases that are implicit.

24:01I mean, it's not because people designed them to have biases.

24:03It's just inevitable.

24:05It's like every newspaper has a bias, right?

24:07Even if it tries not to.

24:08So we need access to a wide diversity of AI assistants for the same reason we need access

24:14to a wide diversity of the press to get multiple sources of information.

24:19The only way I can see that this can happen is if there is an open, free foundation model,

24:26on top of which anybody can build their own specialized assistant for their language or languages,

24:35their culture, their value system, their political biases, their centers of interest.

24:39And so open source has to exist.

24:42Frontier models, you know, have to exist if you want to preserve this diversity.

24:48I think a lot of countries in the world understand this and are ready to put some resources into,

24:53you know, helping build such an effort.

24:56So Tapestry, if you type Project Tapestry on Google, you'll find it.

25:00It's held by the AI Alliance, which is a nonprofit that tries to promote open source AI engines.

25:07And the idea is that every country, region, private company, university can contribute to training a global model

25:19that would eventually become a repository of human knowledge by using their own data,

25:25their own data center, and contributing to training a global model without actually exchanging their data,

25:33exchanging parameter factors.

25:35It's a form of federated learning.

25:38And so we're trying to sort of build this bottom up.

25:41Anybody who is, you know, a good machine learning researcher, engineer, expert in distributed optimization or whatever,

25:48can contribute to it.

25:49There is a GitHub.

25:51Easy to sign up.

25:52And we're trying to sort of do this organically.

25:54But I think we're going to get a lot of support from governments around the world.

25:58They really want this to exist.

25:59I know we're out of time, but just one final thing about that.

26:03When does that happen?

26:06It's happening now.

26:06Okay.

26:08You can go to that website.

26:10It's at the AI Alliance.

26:14And sign up, start hacking, form local communities, et cetera.

26:21There are groups that are, you know, non-profit university groups around the world that have produced really good local

26:28models.

26:29For example, Switzerland has a model called Appertus, collaboration between ETH and EPFL.

26:34Pretty good model.

26:37The Emirates have a model called K2, sort of led by MBZ UAI.

26:47And, you know, there's national efforts in India, in Korea, in Japan, in Europe.

26:55All those people should work together and, you know, build a common model.

27:00Okay.

27:01So, sovereignty.

27:02Thank you so much.

27:03We can go home forever.

27:04Thank you, Steven.

27:05It was fantastic.

27:06Thanks everyone.

27:06Thank you, Lord.

Category

Transcript

Comments

Recommended