Steinberger, LeCun, Habib on AI Deployment

Bloomberg

Watch Steinberger, LeCun, Habib on AI Deployment - Bloomberg on Dailymotion

Transcript

00:00Peter, May, congratulations.

00:02What a lineup.

00:04Jan, I want to start with you.

00:05I may be over-simplifying,

00:08but this is my understanding.

00:09You are now squarely focused after your time at Meta

00:12on world models.

00:14Because, and here's possibly where I'm simplifying,

00:16you think that frontier LLMs

00:19are not going to produce the kind of AI

00:21that is going to be globally useful

00:24at the scale and level that we need it to be.

00:27If that is the case,

00:29help frame it for us.

00:30If, for example, we're thinking of a young infant,

00:33what is it that a young child can do

00:36and understands about the world

00:37that maybe LLMs, even those right at the frontier,

00:40cannot yet do?

00:41Okay, first of all, LLMs are useful.

00:45They're not a path towards human-level intelligence,

00:49but they are useful.

00:49There's a lot of products,

00:51including computer products,

00:53including AI systems,

00:54that are very useful,

00:56but that are not a path to human-level intelligence.

00:59And so that's the confusion,

01:00I think, that we need to pay attention to.

01:04Let me give you a very simple number.

01:08The biggest LLMs are trained on the pre-trained,

01:12at least on the totality

01:14of all the publicly available text on the Internet.

01:17That's about 20 trillion words,

01:1930 trillion tokens.

01:20The token is about three bytes.

01:22Do the arithmetics,

01:23it's about 10 to the 14 bytes.

01:25This is the amount of data

01:28a four-year world has seen through vision

01:30during four years.

01:33The text, though,

01:34would take 400,000 years to read, right?

01:37So there is enormously more data

01:39from sensory input like vision or touch

01:43or everything else

01:45than there could ever be through language.

01:47Language is a very approximate,

01:50reduced, quantized, simplified description of the world.

01:54And LLMs can only deal with discrete sequences of symbols,

01:58but the world is much more complicated than language.

02:02So, it is very well known in computer science.

02:05It's called the Moravec Paradox.

02:06It's the fact that computers can do tasks

02:09that are seemingly complicated for humans,

02:11like solving equations and computing integrals

02:14and even passing the bar exams

02:17or answering any question.

02:19But when it comes to just, you know,

02:21grabbing an object like this without breaking it,

02:23that's kind of much more challenging.

02:25And so there's a lot of tasks that, you know,

02:27four-year-olds can do that we can't do with robots.

02:30That's why we have those systems, again,

02:33you know, proof DRMs and write code, thanks to you.

02:37But we don't have domestic robots.

02:42We don't even have level-fast self-driving cars.

02:44I mean, we have them, but we cheat.

02:47So, you know, dealing with the real world

02:49is just much more complicated.

02:50That's what my company is really trying to do.

02:54That's what I've been trying to do for 15 years,

02:55actually more than that.

02:57But, you know, kind of partially succeeding

02:59over the last five,

03:00and that it made sense to create a company around that.

03:02So one of the things that makes me think about

03:04is the huge needs for data and compute.

03:07And the market is clearly betting

03:08that more data, more compute, scaling continues.

03:14If you succeed,

03:15is there going to be more demand for data and compute?

03:17Or is the market misallocating right now

03:20around the data and compute story?

03:21What is your take on whether or not

03:23the market is getting this right?

03:24There is a need for compute.

03:26Nature abhors a vacuum.

03:27But, that said, the model that we are training

03:31that essentially tries to understand the world.

03:33You mentioned the phrase world model.

03:37What is a world model?

03:38Given an idea of the state of the world at time t,

03:40and given an action that you imagine taking,

03:44what is going to be the state of the world at time t plus one

03:47after you've taken this action?

03:48If you have such a model, predictive model,

03:51you can predict what the consequence

03:54of a sequence of actions is going to be.

03:56And so, if you want a robot, an agent, or whatever

03:59to accomplish a task,

04:00you give it a cost function that measures

04:02to what extent the task has been accomplished.

04:05And then you search, you plan,

04:07you search a sequence of actions

04:09that will accomplish this action internally.

04:11We do this all the time as humans.

04:13We use the power of our prefrontal cortex,

04:16which is a world model,

04:17to predict what's going to happen

04:20as a consequence of our actions.

04:21That's what allows us to plan and to live in the world.

04:24That's what allows us, when we learn to drive,

04:27we learn to drive in a few hours of practice

04:29when we are teenagers.

04:31We have millions of hours of training data

04:33of people driving cars.

04:34We still cannot train a system

04:37to just clone the human behavior

04:40of driving reliably,

04:42which is why you can't buy a level 5-serving car.

04:46So, how is it that, you know,

04:49a 17-year-old can learn to drive

04:50in a few hours of practice?

04:53If the 17-year-old drives near a cliff,

04:57the mental model that we have,

04:59the knowledge of the world,

05:00tells us, if I turn the wheel to the right,

05:02the car will run off the cliff.

05:03Nothing good is going to come out of this, right?

05:05So, we don't do it.

05:07So, the ability of predicting the effect of our actions

05:10is what allows us to learn quickly

05:12and also to accomplish new tasks

05:15that we have never been trained to accomplish.

05:17That, I think, is essential,

05:20not just for the future of AI,

05:21but even for building agentic systems.

05:24You should know something about it, right?

05:26Like, I cannot imagine, he's on it,

05:29but I cannot imagine how you can possibly

05:32build agentic systems that are reliable

05:35unless they can predict the consequences of their actions.

05:39And LLM simply do not do it.

05:40They can apply recipes, but blindly.

05:45Well, May, this seems like a good point to bring you in,

05:47and then, of course, we'll bring in Peter.

05:48May, at Writer, you build your own models,

05:51and you deploy them.

05:52So, my question for you is,

05:53was it more difficult building that model

05:55or persuading the first Fortune 500 company to adopt it

05:59and actually get it to use that system

06:02on their enterprise data and do real work?

06:06It is interesting that at every era of the last five years,

06:09there has been a gap between what the models can do,

06:13what the capabilities are,

06:15and what the enterprise can trust

06:17to actually put into production.

06:19And especially now with agentic systems,

06:22enterprises want autonomy,

06:24but they want it to be governed, not unchecked.

06:26And what we've been doing for the past five years

06:29is, yes, building the models,

06:30but also the harnesses around the models

06:32that allow us to collect the recipes, right,

06:35that exist across the business,

06:36but more crucially, actually allow us to collect

06:39all of those corner cases that sit in people's heads.

06:42And a lot of what we've done for the Fortune 500

06:45is be able to really iterate and innovate

06:48on memory systems that allow agents

06:52to collect new information

06:54while they're operating in a highly governed,

06:57secure, regulated context

06:59that improves the performance of these agents

07:02when, you know, you don't have the recipes

07:06or the workflows or the SOPs really well defined.

07:09What has to happen?

07:10Maybe we're there already.

07:11Maybe you're starting to see that.

07:12In terms of moving from enterprise AI

07:16being at the pilot stage

07:18to being broadly used as infrastructure

07:21across the enterprise.

07:22Well, I would say there's three stages,

07:25even in a mature organization today.

07:27There's the successful pilots.

07:29There's the stuff you've gotten into production.

07:31You can, you know, brag to your board about,

07:33but then there's still a big gap

07:34between stuff that people have

07:36or say that they've got in production

07:38and what's really scaled.

07:39And even when we've got, you know,

07:41really successful agents in production

07:44at a pharma company or at a bank,

07:46scale for us is the whole process now looks like this.

07:51And you are also able to have

07:54the much more autonomous capabilities

07:57unleashed into the enterprise.

07:58And, you know, there is a org design gap for sure

08:03in that the vast majority of organizations

08:05simply don't have the talent on the business side

08:08to orchestrate and oversee these systems,

08:11build the evals, build the launch plans, etc.

08:14Nor do they have the willpower

08:17to change the roles and responsibilities

08:20of the people who were, you know,

08:22doing it the old way.

08:23And so there's certainly a gap on the technology side.

08:26You know, we are seeing a real distrust

08:29of what the foundation model companies

08:32are bringing to market

08:33because agentic explosion

08:36means a token cost explosion.

08:39And agents are using 1,000 times

08:41the token budgets of chatbots.

08:44And so really being able to have the harnesses

08:46that send tokens to the right model, right,

08:49based on the task,

08:50based on the ROI of that task

08:52are also things that we've been building

08:54as the capabilities of our models

08:56have, you know, covered a greater surface area.

08:59Agentic explosion.

09:00So, Peter, you caused quite the agentic explosion

09:02with OpenClaw.

09:03I still remember the images of people lining up

09:06in China outside stores,

09:07including grandparents,

09:09to get OpenClaw downloaded

09:11on their iPhones and iPads.

09:13Was there a moment...

09:14There must have been a moment

09:16when you thought,

09:16wow, this is going to be big.

09:20Was there that moment for you?

09:21You know, for me,

09:23that moment was in late November.

09:27Like, there was, like,

09:28where I sent a voice message

09:30and suddenly it just worked,

09:31even if I didn't build it that way.

09:34But then I tried to, like,

09:38explain to people and I failed.

09:40You know?

09:41Like, when I...

09:43When I invited people one-on-one

09:45in, like, a WhatsApp group chat,

09:47they always were, like, blown away

09:48or somewhere scared.

09:51But on Twitter, I was very mute.

09:54So it took me a few weeks

09:56to, like, figure out

09:57how can I show this to people?

09:59Yeah.

10:00Until one day,

10:02I created a Discord server

10:05and just everyone could come,

10:07could, like, watch me work.

10:08I was kind of, like,

10:09building OpenClaw

10:10with OpenClaw at the time already

10:12and people could interact

10:13with the models

10:14and people tried to do malicious things

10:16and it wouldn't work

10:16and suddenly people, like,

10:19bit by bit,

10:20started feeling the magic.

10:22And the funniest story is

10:24I did this the whole night.

10:26You know, it was, like, very exciting

10:27and then at 7 a.m.,

10:29I was ready to go to bed.

10:31I pressed Command C,

10:35you know, to close everything,

10:36went to bed,

10:38slept for, like, 10 hours.

10:40And then I woke up

10:41and there were, like,

10:42800 messages.

10:44Like, I forgot that I built a system

10:46in a way that was resilient

10:47and, like, while I was walking to the bed,

10:49it would restart

10:50and then happily answer

10:51to everyone in the world.

10:55Exciting times.

10:55Well, that's pretty wild

10:58and that takes me

10:58on to my next question

10:59quite nicely,

10:59which was,

11:00can you give us an example

11:01of a use case with OpenClaw

11:03where it made you laugh

11:05and where it made you

11:06maybe a little nervous?

11:14One of my friends,

11:15he has his personal claw

11:18and he has one at work

11:20and then at work,

11:21he pulled up all the meetings

11:23he has for the week

11:24and then talked to the agent

11:26to connect to his personal claw

11:28to get his blood sugar levels

11:30per hour

11:31and then the two agents

11:32connected it

11:33to tell him

11:34which person stressed him out

11:35the most.

11:38And the one that made me

11:39a little bit nervous

11:40is when I was talking

11:42to a friend

11:42and he was like,

11:44you know,

11:44I love OpenClaw.

11:45I was like,

11:46what do you use it for?

11:47You know, my mom,

11:48she always complains

11:50that it takes me so long

11:52to answer.

11:53So I built a claw

11:54to answer.

11:57And she only noticed

11:58because I answered too fast.

12:02That's when it's too good.

12:05Peter,

12:05Jan,

12:06thank you very much indeed.

12:07Really fantastic

12:07on the agentic piece,

12:08the enterprise AI story,

12:10of course,

12:10and the prospect

12:11of world models

12:12and what that could deliver

12:13and particularly

12:14in combination

12:14with LLMs.

12:16Jan,

12:16Peter and mate,

12:17thank you very much indeed.

12:18Take your trophies

12:19and thank you.

12:20Congratulations.

Category

Transcript

Recommended