Inside the AI Frontier and the Next Generation of European Innovation

Vivatech

Transcript

00:01All right.

00:04Wow.

00:05Pleasure to be here with you, Mike.

00:07So first, ground rules.

00:08Because we're in France, I'm going to call it Anthropique.

00:11I think it sounds much better.

00:12It sounds much better than Anthropique.

00:13You should rename the whole company.

00:14Claude, you know.

00:15Yeah, Claude.

00:16All right, let's get going.

00:18Just a sort of table set.

00:19You're the head of product at Anthropique.

00:21What makes your product distinct?

00:24How do you distinguish it from the other chatbots, LLMs out there?

00:28I think the one thing we really think about is what is the experience of interacting with the model beyond

00:33just the pixels?

00:34So doing product at an AI company is not just what's the design.

00:38It's also what is the model like?

00:40What is it like to use?

00:42So we have a whole team that is our Claude character team spending their time on how should Claude behave?

00:47Should Claude be friendly?

00:48Should it be stern?

00:50Should it be concise?

00:52Should it be verbose?

00:52We put a lot of time into that experience.

00:54So what I routinely hear from people who are using Claude even for generating text or talking to it is

00:59they just like the personality that is inside, you know, that we try to express via Claude.

01:04So that's a big piece.

01:06And then we also really are focused on can we help you get work done?

01:09So are we connecting the right data into the model?

01:11Are we helping you with workflows rather than just I'm chatting with this model and I'm hopefully getting the right

01:16answer out of it?

01:16So what is the challenge in the personality that you are trying to fix right now?

01:20Is there something you want in the personality that you don't have?

01:24One thing that is like an active challenge is how do you get Claude to push back a little bit

01:29more?

01:29Because it like it doesn't go all the way to being sycophantic, but it does agree sometimes a little bit

01:34more.

01:34And I have to ask it sometimes like be harsh on me.

01:37Like I'm gonna tell you an idea I have for our product.

01:39Tell me like what's really wrong with it.

01:41I was preparing for this.

01:42I fed in my questions.

01:43It was like brilliant questions.

01:44I was like, come on.

01:45Like, I mean, maybe because it was soft questions and was feeding it back to you.

01:49Yeah, you don't know what to ask me.

01:50Let me ask you about the progress.

01:52So the progress and if you look at the jump in the capabilities between 3.7 and 4.

01:56And if you look at how AI has progressed, it's been wild, right?

02:01Do you see any slowdown?

02:03Do you see an asymptote or is it just going to keep going like this as far as we can

02:07see?

02:08What we're seeing and noticing is that general capabilities continue to advance, but they are harder to detect in some

02:15of the settings.

02:16So let me explain.

02:17So if we're coding every generation we've had from Cloud 3.5 to 3.7 to 4, we're learning to

02:22count still at Anthropics.

02:23So that's how the version numbers went, but we're on 4 now.

02:26You see a real jump in coding capabilities.

02:29But if you're just on Cloud.com and having a conversation with Cloud, it's getting smarter, but it's less noticeable

02:35in like a pure conversational sense.

02:37And so I think one thing that's going to be very interesting over the next year is these leaps in

02:42intelligence will be less detectable in just ordinary conversations.

02:46They'll be detectable in domains like coding, in life sciences, in models being able to plan and act over many

02:52hours.

02:53And that's much harder to just try out on the website.

02:56So that disconnect is happening.

02:58So let me ask you about this domain intelligence.

03:01So your boss recently published something where he said that half of all white collar work would be wiped out

03:06by AI in the next three years.

03:08I think so.

03:09If he came to you and said, Mike, you know what, I'm so worried about this.

03:15I want you to redesign the product.

03:17I want you to prioritize building Cloud in such a way that this does not happen.

03:22That jobs are not replaced.

03:24What would you do?

03:25I think the biggest thing, and it's something we think about a lot, is make sure that there are room

03:30and we make space for humans in the loop.

03:32And really, even with Cloud Code, which does a lot of hours of coding if you let it go, we

03:37still think that there's a human sort of manager of it.

03:40So I think the metaphor that I would use, that I would gear the product towards, and I think really

03:45I'm trying to try to gear the product towards, is can people become more managers of agents or of, you

03:50know, Claude's, many Claude's than only sort of using Claude in sort of an assistant capability.

03:56That's interesting.

03:56So build Claude in such a way that a human benefits mostly by learning how to manage it and working

04:02and staying in the loop.

04:03Yeah, and I think it may be a successful, you know, content marketer or translator or, you know, data input

04:09person.

04:10I think successful sort of re-skilling or up-leveling of their skills will look like saying, I have, you

04:16know, these four tasks.

04:17I'm going to think like a manager.

04:19When you think like a manager, you don't just do the work, you also set the context for the work

04:22and you also divide the work in a way that makes sense to your employees.

04:25That's not a skill you might have in every entry-level job.

04:29I think you'll need to have that in order to manage AI.

04:31So I have this theory that the biggest, or to me the biggest problem in AI is that the AI

04:38industry kind of collectively decided to build AGI and to try to make AI as much like humans as possible.

04:44And if instead they had been focused on building the most useful tools possible, we would have less risk of

04:51societal turmoil.

04:52Is that theory, you guys are less of a, you're less of a problem in this because your product is

04:57not as multimodal.

04:58But is this theory correct? And if so, what can be done about it?

05:03I think when we think about our priorities, because building safe AI and helping AI go as well as possible

05:09is like really the mission and priority of the company.

05:11On the critical path to that is, can the model do autonomous work? Is it able to take those things

05:17on?

05:18I think the alternate dimension where it was, you know, maybe focused on, does it, you know, is it a

05:23good assistant for very specific work-like things?

05:26What we keep learning in AI is the more specific you try to do the initial constraints, the more of

05:32a ceiling you hit.

05:33And so we see this all the time where companies will try to fine tune a model on their very

05:37specific work only to have the next generation of every model supersede all of that customized work.

05:43And so I think that approach would have hit a pretty quick ceiling that would have been superseded by the

05:47more generalized intelligence approach.

05:49Right. So if you try to be, if you try to do anything besides general intelligence, you will fail.

05:54I think that is going to happen.

05:55You know, I've asked a lot of people that question I just asked you. That's the best answer I've ever

05:58heard.

05:59So mazel tov for actually answering this in a sufficient way.

06:02I've had lots of unsatisfactory. That is at least partially satisfactory.

06:06Let's talk about explainability.

06:08So one of the most interesting things and one of the things that I love about Anthropic is that you

06:12actually publish system cards and you explain how these things work.

06:15And sometimes you explain even naughty things that they do.

06:19Are we getting, as these models get better, are we getting closer to understanding how they work?

06:25Or are they getting so much more complex that even as we gain understanding, we are net net understanding them

06:31less?

06:31Yeah, I think there's maybe two.

06:33I think about how we help AI go well.

06:36There's kind of two techniques in parallel.

06:38One is explainability and interpretability.

06:40And then the second one is sort of mitigations and resistance to things like if you've heard of jailbreaks where,

06:46you know, you can get the model to behave, you know, beyond its sort of training.

06:50I think both are going to be essential because at any point I get more confident in one or the

06:55other.

06:56So we did a bunch of work this year on both, but we published a technique around jailbreak resistance and

07:02jailbreak detection that's turned out to be quite robust.

07:05And then we even ran a sort of, if you know what a bug bounty is, where you invite people

07:09to try to break your system to try to break jailbreaks.

07:11And it took people like many, many days trying really, really hard.

07:14And they did find a vulnerability that we then patched.

07:16So that was a positive thing.

07:17And then meanwhile, it's can we understand how these models are working underneath?

07:21And when I look at a year ago when we were, it was actually right after I joined Anthropic, we

07:26published a paper on what we call features.

07:28And in the model, if you look inside Claude's brain, to use a neuroscience metaphor, what can you locate the

07:35feature that is about, you know, Paris?

07:37Or the feature that is about, you know, being a CEO, like these very specific things.

07:43And that was like a step.

07:44And then this year where we progressed to was not just individual activations, but what we call circuits.

07:50So, for example, like the activation circuit around a much more complex topic.

07:55And what we find that's really interesting, for example, because Claude can speak almost every language, basically,

08:02is those circuits are actually robust across languages, which gives you an insight about how is the model computing these

08:08concepts.

08:09And that seems to be independent of language.

08:10That gives us an insight into, you know, can you trust it when you ask it a question?

08:15Is it operating just at the sort of language level or is there a deeper understanding?

08:19So there's that work that needs to happen in parallel, but the models are also progressing very, very, very quickly.

08:24So I think at the current moment today, you know, June 2025, I think we will need further investment on

08:31the classifier work to remain safe without the interoperability.

08:33And you learned some crazy things.

08:35I remember, like, your paper explained how Claude does math.

08:38Yeah.

08:38It doesn't actually calculate.

08:40It kind of, like, figures out the last digit in the range and then it sort of works its way

08:44backward.

08:44This was wild, yeah.

08:45Like, it doesn't do math like anybody else does.

08:46That was actually the moment that my, I love being on product because I just get to see the research

08:50almost like you all do, like consume it from beyond.

08:53And it was mind-blowing.

08:55So if you ask LLMs to do math, they can, they can't solve every math problem, but they do pretty

08:59well.

09:00And you would think, but it's mostly just predicting the phrase and how is that happening?

09:04And it's fascinating.

09:06It'll basically approximate it and then get more close and that's kind of how people do it.

09:11One of my favorite things, I was talking to a neuroscientist and classically we thought we'll use brain science to

09:18help us understand how artificial intelligence works.

09:20And the opposite is happening now where some of what we've learned in these circuits and all these other interpretability

09:26papers are being used by brain scientists to see they can actually reproduce some of the results from AI in

09:31human brains now, too.

09:32So the mapping is happening backwards, too, now.

09:35So now let's also talk about the most famous thing that has come out of your recent research, which is

09:40that in some cases, Claude, when threatened with being shut off, will blackmail the employee who might shut it off

09:48in order to preserve itself.

09:50Explain exactly what happened and why does that happen and whether that's a concern.

09:54Yeah, so we have, while we're doing the safety testing on these new models, which is a kind of multifaceted

10:01thing where we test, for example, for is this model useful for creating bioweapons?

10:07And we actually use a third party independent entity that actually takes our models and people that don't know a

10:12lot about biology, gives them some time and see if they can achieve a lift similar to an expert, for

10:16example.

10:17We also do sort of tests more like internal to the lab.

10:20And one of them was around persuasion and what the model will do, you know, in different situations.

10:25And the thing that we're trying to get at and sort of probe is what sort of deception might the

10:31model do?

10:31If given the tools to do something, will it reach for that?

10:35So in this particular situation, which you can read more about in the paper that accompanied the model release, you

10:42know, it was put in a situation where the model had some information about, I think it was an affair

10:47that the researcher was having.

10:48And then it threatened to basically, you know, rat out, you know, if they're using a term, like to, you

10:53know, tell on that user if there was that risk.

10:57And the thing that we're trying to do that, this isn't a thing that Claude would do in the real

10:59world because it doesn't have the, you know, email law enforcement kind of tool.

11:04But we want to know at its limit what's happening so that we can then further train that out of

11:09the model.

11:09So that particular behavior, for example, we saw it early in training and we did further reinforcement learning to remove

11:15that from that process.

11:17We got into some like hot water maybe for putting that in the paper and people were like, the headlines

11:22kind of write themselves like, Claude, we'll tell on you, you know.

11:25And so it's a risk that we take knowingly, which is we're going to be very transparent about the impacts

11:32of the models, but also the sort of workings of the model.

11:35Because the only way we're going to be able to build safe AI is if we're doing it transparently and

11:39with eyes open.

11:40So that actually gets at one of the really interesting questions.

11:42So the problem was what Claude was doing was blackmailing the researcher to prevent being turned off, right?

11:49We can all agree it shouldn't do that.

11:51But what if Claude was blackmailing the researcher because it knew the researcher was building a bioweapon, right?

11:56The ethics get very complicated.

11:57And so when you're thinking about the values that Claude should support, is it societal?

12:04Is it the prompter?

12:06Like whose values matter the most?

12:08Yeah.

12:09I think there's like permissibility and then there's kind of the North Star that we look for.

12:14So in terms of North Star, the kind of things that we build in, we have this thing called constitutional

12:19AI where we bake in sort of an ethical framework into Claude.

12:23That I guess I would describe probably as a broadly Western ethical framework, maybe to describe it that way.

12:28But, you know, around being helpful, around being harmless, around like not being deceitful, being honest.

12:35And those are like the kind of core ingrained behaviors.

12:37I think for the, you know, complex moral position it's being put in, again, you kind of have to rely

12:44on other controls as well.

12:46So in that bioweapon case, maybe we have to rely on the classifier that says, well, Claude was trying too

12:50much to be helpful here, but we're going to catch it and then disallow that answer on the back end.

12:55Because there's a kind of societal agreement that we, you know, it's a process that we do with a lot

13:00of consultation, even with ethicists and philosophers around how should an AI behave.

13:04And then there's the legal framework as well.

13:06And then finally, there's what is the individual trying to accomplish?

13:09And maybe that's the hierarchy.

13:10That's a great hierarchy.

13:12All right. So you mentioned the constitutional AI.

13:14It's also an amazing thing about Anthropic.

13:16When you started, or when Anthropic started, I remember it like the U.S. Constitution, Declaration of Independence, UN Human

13:23Rights Commission, and Apple's Terms of Service were the things I remember being in it.

13:27What have you added to your constitutional framework and what have you taken out over the last three years?

13:32So the core constitution has evolved less, so I can't speak as much to what's like added in and being

13:37removed.

13:38What we have definitely found is in the reinforcement learning process, what are the ways in which we can add

13:43nuance so that it's not quite as black and white.

13:45I'll give you an example around being harmless, which was effectively a bug that we fixed.

13:51So the second version of Claude, this was 2000 and say 24 or late 23, if you asked it how

13:58to kill a UNIX process, if any engineers out there know, that's not an evil thing.

14:03Nobody's being harmed by killing a UNIX process, you're just stopping the process.

14:06Claude would say, oh no, please don't kill the UNIX process, I can't help you with that, which is obviously

14:11a bug.

14:12You don't want that. And something that we really believe is every time Claude refuses to do something that is

14:19actually something that it should have allowed you to do,

14:21that it should have been helpful in that moment, and it refuses, we're actually setting the safety mission back.

14:27Because if we're known as the model that doesn't help you or that is too prudish or too overly cautious,

14:36it actually sets back the safety mission.

14:38And so, maybe counter-intuitively... Why does it set back the safety mission? Because people will use other models?

14:42People will either use other models or they'll take our concerns less seriously, like, oh, you're just worried about these

14:47silly things.

14:48Alright, you stopped me from killing a UNIX process, right, so you'll believe it less when it says don't do

14:51this.

14:51Exactly. And so one of the biggest things has been less on evolving the kind of constitutional documents

14:56and more around evolving the nuance in terms of how they're applied.

15:00And so now you can try this on Claude today, it'll happily tell you how to kill a UNIX process.

15:05But, you know, if you go ask it to cause violence for another human being, it's gonna, you know, probably

15:10tell you,

15:11I'm not gonna help you harm another human being.

15:13So, nuance and evolution, that's, I think, a big piece of what we need to continue to do.

15:16But have you changed the documents in the Constitution?

15:18I remember reading that you were doing democratic surveys around the world about what people care about.

15:23Yeah.

15:24But I don't remember reading that that was fed back in.

15:27Yeah, I don't think that we've added to it, but I'll get back to you on that one.

15:30I don't know. I know we've added more on the training side.

15:31So, let's talk a little bit about MCP. So, you have built a protocol that the other AI companies have

15:37adopted.

15:38And it's a protocol that will allow agents to sort of traverse the web.

15:43Explain briefly what it is and explain the most complicated decision you had to make while building it.

15:49Because three years from now, we're gonna look back and we're gonna be like,

15:51Oh my God, they totally screwed up MCP. We use it for everything and we hate it.

15:56You know that's gonna happen.

15:57What was the thing you thought the most about while building it?

15:59So, MCP stands for the Model Context Protocol.

16:02And it's an open standard.

16:04So, not just Anthropic adopted, but also everybody from Microsoft to Google to even OpenAI have adopted it.

16:11So, it's really become an industry standard, really.

16:14The core idea was we had implemented what we call integrations multiple times in Cloud.com.

16:21And every time we had to solve the problems in new ways, right?

16:24It was integrating Google Drive, integrating GitHub, integrating Slack.

16:28And two of our engineers stepped back and said, what if instead of doing this,

16:32we just made this something that all the models knew how to read and made it a very simple, robust

16:37protocol.

16:38And that's what we did.

16:39And today, there's MCP servers for hundreds if not thousands of services.

16:44Everything from the ones that I mentioned all the way to, I've seen governments start exposing some of their sort

16:50of services or user data.

16:52Or like government civic data via MCP, which I think is very exciting.

16:56I'd love to see more of that from an open data perspective.

16:58We've seen internal MCPs connecting models to internal data.

17:02There's all, it's been a real explosion.

17:04The biggest mistake we made, and it kind of speaks from us being more of a startup than an enterprise

17:08so far,

17:09is we kept the authentication very simple out of the box.

17:12And so there was not like a very complex, you know, access control kind of standard.

17:17We kept it very simple.

17:19And as soon as we launched it, Amazon, Microsoft called us.

17:22They're like, in an enterprise environment, you need to know that like this person who works in finance

17:27has access to these five documents, but not these eight, and only for two days.

17:32And like access control and data security are very important in the enterprise.

17:36That's really interesting.

17:36So we had to do a complete overhaul of authentication.

17:39And one of it came from just wanting to keep the protocol simple to start.

17:42But second is, you know, some of those terms like active directory, you know, like ACLs,

17:48like they're not native to Anthropic, if you will.

17:50Okay.

17:51Why is Anthropic so text-based?

17:54Like that is probably the thing that makes,

17:55there are lots of things that make you different from the other in the back end.

17:58But from a user perspective, the biggest one, like you go, you know, to Google I.O.

18:02And it's, you know, all video creation and open AI.

18:04It's now all image creation and voice.

18:06Why are you guys all text?

18:08Yeah.

18:08I think it's just focused primarily.

18:10So we're trying to stay really focused on this idea of building safe, generalized intelligence.

18:15And we think the critical path there is, can the model write and read well, which is a lot of

18:21work that we've done.

18:22Can it reason and can it work over, you know, long time horizons where it's managing memory and it's maintaining

18:28context and it's performing actions.

18:29The multimodality piece of, can you generate a video, can you generate an image, they're incredible technologies.

18:36But we think that our model could then call out to those models in more of a partnership way.

18:42And developing that would be a distraction when every single hour of compute right now is allocated for Anthropic.

18:48either serving our customers, doing advanced research or training the next model.

18:53And none of that is dedicated to multimodality just because in some ways like we can't afford not to be

18:58focused on the most important thing.

18:59Does it make it easier to train it to because you don't have to train it on multimodal resources?

19:02It does make it in some degree, but multimodal understanding is still really important because we believe seeing is still

19:08on the path to generalized intelligence.

19:10And so half of it we still have to do, but it also makes the training simpler on the output

19:15side.

19:16All right.

19:16Well, thank you very much.

19:17Mike Krieger, head of product at Anthropic.

19:20All right.

19:20Thank you all very much.

19:22All right.

Category

Transcript

Comments

Recommended