- il y a 2 semaines
Inside the AI Frontier and the Next Generation of European Innovation
Catégorie
🤖
TechnologieTranscription
00:01All right.
00:04Wow.
00:05Pleasure to be here with you, Mike.
00:07So first, ground rules.
00:08Because we're in France, I'm going to call it Anthropique.
00:11I think it sounds much better.
00:12It sounds much better than Anthropique.
00:13You should rename the whole company.
00:14Claude, you know.
00:15Yeah, Claude.
00:16All right, let's get going.
00:18Just a sort of table set.
00:19You're the head of product at Anthropique.
00:21What makes your product distinct?
00:24How do you distinguish it from the other chatbots, LLMs out there?
00:28I think the one thing we really think about is what is the experience of interacting with the model beyond
00:33just the pixels?
00:34So doing product at an AI company is not just what's the design.
00:38It's also what is the model like?
00:40What is it like to use?
00:42So we have a whole team that is our Claude character team spending their time on how should Claude behave?
00:47Should Claude be friendly?
00:48Should it be stern?
00:50Should it be concise?
00:52Should it be verbose?
00:52We put a lot of time into that experience.
00:54So what I routinely hear from people who are using Claude even for generating text or talking to it is
00:59they just like the personality that is inside, you know, that we try to express via Claude.
01:04So that's a big piece.
01:06And then we also really are focused on can we help you get work done?
01:09So are we connecting the right data into the model?
01:11Are we helping you with workflows rather than just I'm chatting with this model and I'm hopefully getting the right
01:16answer out of it?
01:16So what is the challenge in the personality that you are trying to fix right now?
01:20Is there something you want in the personality that you don't have?
01:24One thing that is like an active challenge is how do you get Claude to push back a little bit
01:29more?
01:29Because it like it doesn't go all the way to being sycophantic, but it does agree sometimes a little bit
01:34more.
01:34And I have to ask it sometimes like be harsh on me.
01:37Like I'm gonna tell you an idea I have for our product.
01:39Tell me like what's really wrong with it.
01:41I was preparing for this.
01:42I fed in my questions.
01:43It was like brilliant questions.
01:44I was like, come on.
01:45Like, I mean, maybe because it was soft questions and was feeding it back to you.
01:49Yeah, you don't know what to ask me.
01:50Let me ask you about the progress.
01:52So the progress and if you look at the jump in the capabilities between 3.7 and 4.
01:56And if you look at how AI has progressed, it's been wild, right?
02:01Do you see any slowdown?
02:03Do you see an asymptote or is it just going to keep going like this as far as we can
02:07see?
02:08What we're seeing and noticing is that general capabilities continue to advance, but they are harder to detect in some
02:15of the settings.
02:16So let me explain.
02:17So if we're coding every generation we've had from Cloud 3.5 to 3.7 to 4, we're learning to
02:22count still at Anthropics.
02:23So that's how the version numbers went, but we're on 4 now.
02:26You see a real jump in coding capabilities.
02:29But if you're just on Cloud.com and having a conversation with Cloud, it's getting smarter, but it's less noticeable
02:35in like a pure conversational sense.
02:37And so I think one thing that's going to be very interesting over the next year is these leaps in
02:42intelligence will be less detectable in just ordinary conversations.
02:46They'll be detectable in domains like coding, in life sciences, in models being able to plan and act over many
02:52hours.
02:53And that's much harder to just try out on the website.
02:56So that disconnect is happening.
02:58So let me ask you about this domain intelligence.
03:01So your boss recently published something where he said that half of all white collar work would be wiped out
03:06by AI in the next three years.
03:08I think so.
03:09If he came to you and said, Mike, you know what, I'm so worried about this.
03:15I want you to redesign the product.
03:17I want you to prioritize building Cloud in such a way that this does not happen.
03:22That jobs are not replaced.
03:24What would you do?
03:25I think the biggest thing, and it's something we think about a lot, is make sure that there are room
03:30and we make space for humans in the loop.
03:32And really, even with Cloud Code, which does a lot of hours of coding if you let it go, we
03:37still think that there's a human sort of manager of it.
03:40So I think the metaphor that I would use, that I would gear the product towards, and I think really
03:45I'm trying to try to gear the product towards, is can people become more managers of agents or of, you
03:50know, Claude's, many Claude's than only sort of using Claude in sort of an assistant capability.
03:56That's interesting.
03:56So build Claude in such a way that a human benefits mostly by learning how to manage it and working
04:02and staying in the loop.
04:03Yeah, and I think it may be a successful, you know, content marketer or translator or, you know, data input
04:09person.
04:10I think successful sort of re-skilling or up-leveling of their skills will look like saying, I have, you
04:16know, these four tasks.
04:17I'm going to think like a manager.
04:19When you think like a manager, you don't just do the work, you also set the context for the work
04:22and you also divide the work in a way that makes sense to your employees.
04:25That's not a skill you might have in every entry-level job.
04:29I think you'll need to have that in order to manage AI.
04:31So I have this theory that the biggest, or to me the biggest problem in AI is that the AI
04:38industry kind of collectively decided to build AGI and to try to make AI as much like humans as possible.
04:44And if instead they had been focused on building the most useful tools possible, we would have less risk of
04:51societal turmoil.
04:52Is that theory, you guys are less of a, you're less of a problem in this because your product is
04:57not as multimodal.
04:58But is this theory correct? And if so, what can be done about it?
05:03I think when we think about our priorities, because building safe AI and helping AI go as well as possible
05:09is like really the mission and priority of the company.
05:11On the critical path to that is, can the model do autonomous work? Is it able to take those things
05:17on?
05:18I think the alternate dimension where it was, you know, maybe focused on, does it, you know, is it a
05:23good assistant for very specific work-like things?
05:26What we keep learning in AI is the more specific you try to do the initial constraints, the more of
05:32a ceiling you hit.
05:33And so we see this all the time where companies will try to fine tune a model on their very
05:37specific work only to have the next generation of every model supersede all of that customized work.
05:43And so I think that approach would have hit a pretty quick ceiling that would have been superseded by the
05:47more generalized intelligence approach.
05:49Right. So if you try to be, if you try to do anything besides general intelligence, you will fail.
05:54I think that is going to happen.
05:55You know, I've asked a lot of people that question I just asked you. That's the best answer I've ever
05:58heard.
05:59So mazel tov for actually answering this in a sufficient way.
06:02I've had lots of unsatisfactory. That is at least partially satisfactory.
06:06Let's talk about explainability.
06:08So one of the most interesting things and one of the things that I love about Anthropic is that you
06:12actually publish system cards and you explain how these things work.
06:15And sometimes you explain even naughty things that they do.
06:19Are we getting, as these models get better, are we getting closer to understanding how they work?
06:25Or are they getting so much more complex that even as we gain understanding, we are net net understanding them
06:31less?
06:31Yeah, I think there's maybe two.
06:33I think about how we help AI go well.
06:36There's kind of two techniques in parallel.
06:38One is explainability and interpretability.
06:40And then the second one is sort of mitigations and resistance to things like if you've heard of jailbreaks where,
06:46you know, you can get the model to behave, you know, beyond its sort of training.
06:50I think both are going to be essential because at any point I get more confident in one or the
06:55other.
06:56So we did a bunch of work this year on both, but we published a technique around jailbreak resistance and
07:02jailbreak detection that's turned out to be quite robust.
07:05And then we even ran a sort of, if you know what a bug bounty is, where you invite people
07:09to try to break your system to try to break jailbreaks.
07:11And it took people like many, many days trying really, really hard.
07:14And they did find a vulnerability that we then patched.
07:16So that was a positive thing.
07:17And then meanwhile, it's can we understand how these models are working underneath?
07:21And when I look at a year ago when we were, it was actually right after I joined Anthropic, we
07:26published a paper on what we call features.
07:28And in the model, if you look inside Claude's brain, to use a neuroscience metaphor, what can you locate the
07:35feature that is about, you know, Paris?
07:37Or the feature that is about, you know, being a CEO, like these very specific things.
07:43And that was like a step.
07:44And then this year where we progressed to was not just individual activations, but what we call circuits.
07:50So, for example, like the activation circuit around a much more complex topic.
07:55And what we find that's really interesting, for example, because Claude can speak almost every language, basically,
08:02is those circuits are actually robust across languages, which gives you an insight about how is the model computing these
08:08concepts.
08:09And that seems to be independent of language.
08:10That gives us an insight into, you know, can you trust it when you ask it a question?
08:15Is it operating just at the sort of language level or is there a deeper understanding?
08:19So there's that work that needs to happen in parallel, but the models are also progressing very, very, very quickly.
08:24So I think at the current moment today, you know, June 2025, I think we will need further investment on
08:31the classifier work to remain safe without the interoperability.
08:33And you learned some crazy things.
08:35I remember, like, your paper explained how Claude does math.
08:38Yeah.
08:38It doesn't actually calculate.
08:40It kind of, like, figures out the last digit in the range and then it sort of works its way
08:44backward.
08:44This was wild, yeah.
08:45Like, it doesn't do math like anybody else does.
08:46That was actually the moment that my, I love being on product because I just get to see the research
08:50almost like you all do, like consume it from beyond.
08:53And it was mind-blowing.
08:55So if you ask LLMs to do math, they can, they can't solve every math problem, but they do pretty
08:59well.
09:00And you would think, but it's mostly just predicting the phrase and how is that happening?
09:04And it's fascinating.
09:06It'll basically approximate it and then get more close and that's kind of how people do it.
09:11One of my favorite things, I was talking to a neuroscientist and classically we thought we'll use brain science to
09:18help us understand how artificial intelligence works.
09:20And the opposite is happening now where some of what we've learned in these circuits and all these other interpretability
09:26papers are being used by brain scientists to see they can actually reproduce some of the results from AI in
09:31human brains now, too.
09:32So the mapping is happening backwards, too, now.
09:35So now let's also talk about the most famous thing that has come out of your recent research, which is
09:40that in some cases, Claude, when threatened with being shut off, will blackmail the employee who might shut it off
09:48in order to preserve itself.
09:50Explain exactly what happened and why does that happen and whether that's a concern.
09:54Yeah, so we have, while we're doing the safety testing on these new models, which is a kind of multifaceted
10:01thing where we test, for example, for is this model useful for creating bioweapons?
10:07And we actually use a third party independent entity that actually takes our models and people that don't know a
10:12lot about biology, gives them some time and see if they can achieve a lift similar to an expert, for
10:16example.
10:17We also do sort of tests more like internal to the lab.
10:20And one of them was around persuasion and what the model will do, you know, in different situations.
10:25And the thing that we're trying to get at and sort of probe is what sort of deception might the
10:31model do?
10:31If given the tools to do something, will it reach for that?
10:35So in this particular situation, which you can read more about in the paper that accompanied the model release, you
10:42know, it was put in a situation where the model had some information about, I think it was an affair
10:47that the researcher was having.
10:48And then it threatened to basically, you know, rat out, you know, if they're using a term, like to, you
10:53know, tell on that user if there was that risk.
10:57And the thing that we're trying to do that, this isn't a thing that Claude would do in the real
10:59world because it doesn't have the, you know, email law enforcement kind of tool.
11:04But we want to know at its limit what's happening so that we can then further train that out of
11:09the model.
11:09So that particular behavior, for example, we saw it early in training and we did further reinforcement learning to remove
11:15that from that process.
11:17We got into some like hot water maybe for putting that in the paper and people were like, the headlines
11:22kind of write themselves like, Claude, we'll tell on you, you know.
11:25And so it's a risk that we take knowingly, which is we're going to be very transparent about the impacts
11:32of the models, but also the sort of workings of the model.
11:35Because the only way we're going to be able to build safe AI is if we're doing it transparently and
11:39with eyes open.
11:40So that actually gets at one of the really interesting questions.
11:42So the problem was what Claude was doing was blackmailing the researcher to prevent being turned off, right?
11:49We can all agree it shouldn't do that.
11:51But what if Claude was blackmailing the researcher because it knew the researcher was building a bioweapon, right?
11:56The ethics get very complicated.
11:57And so when you're thinking about the values that Claude should support, is it societal?
12:04Is it the prompter?
12:06Like whose values matter the most?
12:08Yeah.
12:09I think there's like permissibility and then there's kind of the North Star that we look for.
12:14So in terms of North Star, the kind of things that we build in, we have this thing called constitutional
12:19AI where we bake in sort of an ethical framework into Claude.
12:23That I guess I would describe probably as a broadly Western ethical framework, maybe to describe it that way.
12:28But, you know, around being helpful, around being harmless, around like not being deceitful, being honest.
12:35And those are like the kind of core ingrained behaviors.
12:37I think for the, you know, complex moral position it's being put in, again, you kind of have to rely
12:44on other controls as well.
12:46So in that bioweapon case, maybe we have to rely on the classifier that says, well, Claude was trying too
12:50much to be helpful here, but we're going to catch it and then disallow that answer on the back end.
12:55Because there's a kind of societal agreement that we, you know, it's a process that we do with a lot
13:00of consultation, even with ethicists and philosophers around how should an AI behave.
13:04And then there's the legal framework as well.
13:06And then finally, there's what is the individual trying to accomplish?
13:09And maybe that's the hierarchy.
13:10That's a great hierarchy.
13:12All right. So you mentioned the constitutional AI.
13:14It's also an amazing thing about Anthropic.
13:16When you started, or when Anthropic started, I remember it like the U.S. Constitution, Declaration of Independence, UN Human
13:23Rights Commission, and Apple's Terms of Service were the things I remember being in it.
13:27What have you added to your constitutional framework and what have you taken out over the last three years?
13:32So the core constitution has evolved less, so I can't speak as much to what's like added in and being
13:37removed.
13:38What we have definitely found is in the reinforcement learning process, what are the ways in which we can add
13:43nuance so that it's not quite as black and white.
13:45I'll give you an example around being harmless, which was effectively a bug that we fixed.
13:51So the second version of Claude, this was 2000 and say 24 or late 23, if you asked it how
13:58to kill a UNIX process, if any engineers out there know, that's not an evil thing.
14:03Nobody's being harmed by killing a UNIX process, you're just stopping the process.
14:06Claude would say, oh no, please don't kill the UNIX process, I can't help you with that, which is obviously
14:11a bug.
14:12You don't want that. And something that we really believe is every time Claude refuses to do something that is
14:19actually something that it should have allowed you to do,
14:21that it should have been helpful in that moment, and it refuses, we're actually setting the safety mission back.
14:27Because if we're known as the model that doesn't help you or that is too prudish or too overly cautious,
14:36it actually sets back the safety mission.
14:38And so, maybe counter-intuitively... Why does it set back the safety mission? Because people will use other models?
14:42People will either use other models or they'll take our concerns less seriously, like, oh, you're just worried about these
14:47silly things.
14:48Alright, you stopped me from killing a UNIX process, right, so you'll believe it less when it says don't do
14:51this.
14:51Exactly. And so one of the biggest things has been less on evolving the kind of constitutional documents
14:56and more around evolving the nuance in terms of how they're applied.
15:00And so now you can try this on Claude today, it'll happily tell you how to kill a UNIX process.
15:05But, you know, if you go ask it to cause violence for another human being, it's gonna, you know, probably
15:10tell you,
15:11I'm not gonna help you harm another human being.
15:13So, nuance and evolution, that's, I think, a big piece of what we need to continue to do.
15:16But have you changed the documents in the Constitution?
15:18I remember reading that you were doing democratic surveys around the world about what people care about.
15:23Yeah.
15:24But I don't remember reading that that was fed back in.
15:27Yeah, I don't think that we've added to it, but I'll get back to you on that one.
15:30I don't know. I know we've added more on the training side.
15:31So, let's talk a little bit about MCP. So, you have built a protocol that the other AI companies have
15:37adopted.
15:38And it's a protocol that will allow agents to sort of traverse the web.
15:43Explain briefly what it is and explain the most complicated decision you had to make while building it.
15:49Because three years from now, we're gonna look back and we're gonna be like,
15:51Oh my God, they totally screwed up MCP. We use it for everything and we hate it.
15:56You know that's gonna happen.
15:57What was the thing you thought the most about while building it?
15:59So, MCP stands for the Model Context Protocol.
16:02And it's an open standard.
16:04So, not just Anthropic adopted, but also everybody from Microsoft to Google to even OpenAI have adopted it.
16:11So, it's really become an industry standard, really.
16:14The core idea was we had implemented what we call integrations multiple times in Cloud.com.
16:21And every time we had to solve the problems in new ways, right?
16:24It was integrating Google Drive, integrating GitHub, integrating Slack.
16:28And two of our engineers stepped back and said, what if instead of doing this,
16:32we just made this something that all the models knew how to read and made it a very simple, robust
16:37protocol.
16:38And that's what we did.
16:39And today, there's MCP servers for hundreds if not thousands of services.
16:44Everything from the ones that I mentioned all the way to, I've seen governments start exposing some of their sort
16:50of services or user data.
16:52Or like government civic data via MCP, which I think is very exciting.
16:56I'd love to see more of that from an open data perspective.
16:58We've seen internal MCPs connecting models to internal data.
17:02There's all, it's been a real explosion.
17:04The biggest mistake we made, and it kind of speaks from us being more of a startup than an enterprise
17:08so far,
17:09is we kept the authentication very simple out of the box.
17:12And so there was not like a very complex, you know, access control kind of standard.
17:17We kept it very simple.
17:19And as soon as we launched it, Amazon, Microsoft called us.
17:22They're like, in an enterprise environment, you need to know that like this person who works in finance
17:27has access to these five documents, but not these eight, and only for two days.
17:32And like access control and data security are very important in the enterprise.
17:36That's really interesting.
17:36So we had to do a complete overhaul of authentication.
17:39And one of it came from just wanting to keep the protocol simple to start.
17:42But second is, you know, some of those terms like active directory, you know, like ACLs,
17:48like they're not native to Anthropic, if you will.
17:50Okay.
17:51Why is Anthropic so text-based?
17:54Like that is probably the thing that makes,
17:55there are lots of things that make you different from the other in the back end.
17:58But from a user perspective, the biggest one, like you go, you know, to Google I.O.
18:02And it's, you know, all video creation and open AI.
18:04It's now all image creation and voice.
18:06Why are you guys all text?
18:08Yeah.
18:08I think it's just focused primarily.
18:10So we're trying to stay really focused on this idea of building safe, generalized intelligence.
18:15And we think the critical path there is, can the model write and read well, which is a lot of
18:21work that we've done.
18:22Can it reason and can it work over, you know, long time horizons where it's managing memory and it's maintaining
18:28context and it's performing actions.
18:29The multimodality piece of, can you generate a video, can you generate an image, they're incredible technologies.
18:36But we think that our model could then call out to those models in more of a partnership way.
18:42And developing that would be a distraction when every single hour of compute right now is allocated for Anthropic.
18:48either serving our customers, doing advanced research or training the next model.
18:53And none of that is dedicated to multimodality just because in some ways like we can't afford not to be
18:58focused on the most important thing.
18:59Does it make it easier to train it to because you don't have to train it on multimodal resources?
19:02It does make it in some degree, but multimodal understanding is still really important because we believe seeing is still
19:08on the path to generalized intelligence.
19:10And so half of it we still have to do, but it also makes the training simpler on the output
19:15side.
19:16All right.
19:16Well, thank you very much.
19:17Mike Krieger, head of product at Anthropic.
19:20All right.
19:20Thank you all very much.
19:22All right.
19:22All right.
Commentaires