Beyond the Glitch: Understanding and Mitigating AI Hallucinations

Vivatech

Transcript

00:00Hello, everybody. My name is Natasha Bernal. I'm Senior Business Editor at Wired.

00:06I am joined today in a very exciting panel, which is talking about AI hallucinations.

00:12Now, I do want to involve the audience, so be prepared.

00:17But first, I'm going to introduce my panellists.

00:19So, we've got on the end Dr. Amar Awaldala, who's the CEO and founder of Victara.

00:25We've got in the middle, Sasha Rubel, the head of AI and generative AI policy, EMEA at AWS.

00:31And right next to me, we've got Dr. Giada Pistilli, the principal ethicist at Hugging Face.

00:37Thank you very much for joining me.

00:39Okay, so my one and only question to the audience is just one way you lift your hand.

00:43So, I want to find out if any of you have asked any generative AI, any LLM, a question and

00:49have received the wrong response.

00:51Lift your hand if you've got the wrong, I love this, you guys are great.

00:56So have I.

00:57If you want, I'll tell you the story.

01:00I asked generative AI to book me the perfect day out in London.

01:05I ended up going to a really ratty looking, very awful Indian vegetarian place in the middle of nowhere.

01:13Then travelled across London to go on a bike in the middle of the Olympic Park in the rain.

01:18It was awful.

01:19It knew where I was going, but it didn't care.

01:23Now, obviously the stakes in that example are very, very small, but the stakes we might be talking about here

01:29are very, very high.

01:31So, I guess I want to start with a premise that I've seen time and time again online, which I

01:36want you all to address, if that's all right.

01:39LLMs will always hallucinate, and we just need to accept it.

01:44Let's start with Sasha.

01:46Is that true, or is it false?

01:47So, true, and I will add that I tested a model once that said that I was a man and

01:52a chess coach.

01:53So I can confirm that there are some errors sometimes, but I would nuance my answer of true with a

01:58couple of things.

01:59And Amr's t-shirt, which I hope you all can see, which says rag against the machine, is what I

02:04really want to emphasize,

02:06which is that true, LLMs will hallucinate, but also true, there are lots of really exciting things that we can

02:11do using AI as a technology and other kinds of innovations to actually address those hallucinations.

02:17And it also points to the fact that, for example, if you go to a library, you look online or

02:22you use search, you're going to verify the information that you find on that search.

02:25So at the end of the day, we really need to be investing in digital literacy and digital skills to

02:30help people be able to use these tools in a constructive way, but also critically engage with them.

02:36So I don't believe everything you read.

02:38Tiada?

02:38Yeah, so I agree 100% on what has been said.

02:41And I think I'd also like to add that if we start also having better evaluation, better metrics, and better

02:49data, then it can also be nuanced.

02:52That, of course, it's something that we can help mitigate.

02:55And sure, it's still like a big problem to address.

02:59We don't have like a silver bullet, but if we do, if we start doing and maybe taking a little

03:05bit more of time to do things in a better way, then we can find also solutions, yes.

03:11All right, Amo, take us away.

03:12Yeah, so I agree and disagree with the way you put it.

03:16So you said RAG will always hallucinate, and we have to accept that.

03:21I agree with RAG, sorry, LLMs will always hallucinate, and we have to accept that.

03:26I agree with LLMs will always hallucinate.

03:28And the reason why they will always hallucinate is because LLMs at their heart are probabilistic.

03:33There is a probability, and whenever you have a probability, there is going to be a likelihood of error.

03:38It's just how it is.

03:39It has been improving a lot, but at some point it will level off.

03:43And it is leveling off right now where we're not able to get it to be better, any better.

03:47So clearly everybody now seeing hallucinations will always be there.

03:52Now, but we don't have to accept it, right?

03:54In the same way that if you have a human who, on average, is an amazing doctor, but every now

04:00and then makes a medical diagnosis mistake, you still use that doctor.

04:04But then you have a fact checker beside him to catch him when he does the mistake, right?

04:08And I think that's what we need to evolve to as an industry, to figure out when can we trust

04:13the answer.

04:14This is a good answer.

04:15I can trust it.

04:16Let's deploy it.

04:17Versus, oh, this answer might be off.

04:19Let's have a human expert now help the machine make the proper call.

04:24I think it's interesting that you bring up a doctor because I personally, if I knew about it, wouldn't use

04:28the doctor at all, to be honest.

04:30I would just be like, hell no.

04:31But I do think it's interesting because obviously we're talking about maybe quite low stakes.

04:36So if I ask, you know, for a day out versus I'm asking for decisions about a company, it's a

04:42very, very different kind of scenario.

04:43So I guess I wanted to talk about hybrid AI a little bit as well.

04:47With the growth of use of agentic AI in businesses, what are the risks that we're looking at here moving

04:55forward?

04:56Maybe, Jada, do you want to weigh in on this one?

04:59Yeah, sure.

05:00So also to follow up, I completely agree, especially knowing that the best scenario where also not only LLMs, but

05:07AI tools in general work best

05:10when they work together with experts who know their field and they can not only fact check, but also exploit

05:18them in a better way.

05:19And so that also calls for better evaluations, as I was saying.

05:23And so to answer your question, I agree that not all of the contexts are the same.

05:28So, for instance, if we take health care, as you were mentioning, and, for instance, you have hallucinations for diagnosis,

05:35then it can have, like, really big problems, and not only ethical problems, but also legal problems, and so challenges.

05:42Or, for instance, if you're a student or a researcher, and then your LLM starts fabricating fake references, for instance,

05:52then it can become quite problematic, or, I don't know, maybe another use case could be, like, in a legal

05:58context,

05:58where you're a lawyer, and you're building up your case, and then it starts, like, and that already happened in

06:04the past,

06:05and it starts, like, hallucinating completely invented and fake case, like, references,

06:12then it can be quite problematic, especially if you're trying to build in your argument on top of it.

06:17And so, yeah, it really depends, once again, on the context, and I think that it's still, like, kind of

06:25a hard problem to solve in that sense.

06:27It's funny you mentioned the legal sector, because I think two lawyers in the UK were reprimanded recently

06:32for using AI-generated content in their arguments, and I've got here written down a good example of AI hallucinations

06:40in law.

06:41There is a person called Damien Charlotene, who is a lawyer and also a data scientist,

06:47who says he's collated 120 examples of proceedings that have hallucinated because of AI-generated content.

06:54And we're talking about quite serious things.

06:58I mean, obviously, it's just a small example of something that, for many, isn't still a widespread use.

07:04And having covered the legal sector, I can tell you they're very careful people.

07:08So, for this to be an example in that sort of more traditional sector,

07:12it kind of shows a bit of a drop in the ocean of what might be happening elsewhere.

07:16I guess...

07:17Can I just add something?

07:18Yeah, absolutely. Yeah.

07:20So, first, very quickly, on the lawyer thing that happened in the U.S. many times as well,

07:24and one of the funniest ones was when the judge was confronting the lawyer and said,

07:28have you not checked your work?

07:30And he said, yes, I asked ChatGPT to check the work.

07:33It's like, dude, so you asked ChatGPT to make the proceeding, which then included the hallucinations,

07:39and then you expected it to correct itself, which obviously shows that there is a lot of education

07:43that our industry needs to go through and how these systems can be used.

07:47But your question also highlighted another key thing, where you were saying,

07:51what are some of the issues that we see when we deploy these technologies?

07:54And hallucination and quality was one key issue.

07:58But another very big key issue that all of us are working very hard on addressing is security.

08:04These systems are very, very susceptible to something called prompt attacks.

08:08Maybe show of hands if anybody heard about prompt attacks.

08:11About 20% of the room raised their hand.

08:14So for the remaining 80%, prompt attacks for large language models is just like social engineering attacks for us humans.

08:21When somebody calls you up at home, pretends to be from your bank to get your PIN number,

08:25and then steal all of your money.

08:27With large language models, you can sometimes pressure them in a way

08:30where they will give you information that you're not supposed to see.

08:34And that can be a very, very big problem from a privacy perspective.

08:37What do you think about this idea of machine unlearning to kind of remove some of the hallucinations from existing

08:46data sets?

08:47Sasha, perhaps you can weigh in on that one.

08:49So conceptually, very exciting.

08:51It's technically very challenging.

08:53And just to kind of rebound on what the two of you are sharing,

08:55I think there are two things that are really important in terms of when we're thinking about machine unlearning,

09:00how they can be complemented by other approaches.

09:03One is the importance of open source for advancing research around safety and responsibility.

09:09And just a huge shout out to what you're doing, Jada, your research and hugging face.

09:13And we're very excited to be partnering you on that because it's about democratizing access,

09:17but also about democratizing the opportunity to advance the research around some of the biggest risks that exist.

09:23And you highlighted them.

09:24Accuracy, security, questions around bias and fairness.

09:28So that's an approach that I think is complementary to the machine unlearning.

09:32And then the other aspect, one of the things that we're seeing is that if you talk to organizations and

09:37companies,

09:38but also if you talk to policymakers and you ask them,

09:40do you think transparency, accuracy, fairness, do you think these are important principles?

09:45Every single person will say yes.

09:48But if you talk to an engineer developing the solutions,

09:52they're going to say, those are really great words, but exactly what do you want me to do as a

09:55developer and a deployer?

09:56And actually, we undertook research that showed that around 85% of organizations see these questions around mitigating hallucinations,

10:05ensuring accuracy as top priorities.

10:07But actually, only 25% of them have governance in place internally in order to address these challenges.

10:13And so really focusing on providing guidelines, which is what we spend a lot of time doing with both of

10:18you,

10:19and very happily, is really important because otherwise these stay principles.

10:23And I think you did a really good job at highlighting the importance of human oversight.

10:27One of the most important and insightful things that I've heard in conversations around these issues

10:33is that we're asking for 100% accuracy from probabilistic machines.

10:39And that artificial intelligence is not something that's magic.

10:42What you need to be asking and the way in which you need to be framing the question

10:45is, is this solution more accurate than humans?

10:49And how can it work together to augment human intelligence and provide more safety in the long run,

10:55especially when it's deployed in high-risk scenarios like the health situations that you were talking about?

11:00Can I ask, just to follow up on what you're saying,

11:02you said that 25% of businesses have some sort of guidelines in place to try to mitigate hallucinations.

11:08What are the rest of the 75% doing?

11:11Good question.

11:12I think a lot of people are stuck at the ideation phase of saying,

11:15this is something that's really important, but we're struggling on how to translate that into practice.

11:20We actually just published research that we commissioned called Unlocking Europe's AI Potential

11:24that looks at the real blockers to AI adoption.

11:27And regulatory uncertainty was one of the biggest blockers to AI adoption.

11:31And that includes understanding how to translate principles for responsible AI governance,

11:36specifically into practice.

11:38But I think there's also an interesting debate, just zooming out for a second,

11:41around questions around responsibility and innovation.

11:44And it's a fake debate, which is, do we innovate responsibly,

11:48or do we innovate fast and with velocity and with excellence?

11:52And actually, one of the things that we saw as a blocker

11:55is the lack of trust in this technology because of things like hallucinations.

11:58So you need to start, and we need to help organizations.

12:01It's one of the things that we're doing with our Generative AI Innovation Center in France and beyond,

12:06really help organizations move from an approach to responsibility that's kind of bolted on to built in to by design.

12:13How do we build systems by design that are responsible and take into consideration

12:18some of the great innovations that you've been building, for example, from the start?

12:21So it's not an afterthought. It's the starting point.

12:23Amir, you want to share something?

12:25Yeah, I was going to say on the concept of the machine unlearning,

12:27which is a very, very important concept, actually.

12:29And we are at the beginning of creating a very big problem for all of us,

12:33is all of us right now, we are leveraging these technologies to generate content

12:38that we're putting in our websites and videos that we're putting on YouTube.

12:41And then the machine is crawling that content to learn the new version of ChatGPT

12:47and the new version of, I apologize, what's the AWS model's name?

12:51Nova.

12:52Nova, et cetera, et cetera.

12:53So now the machine is learning from its own garbage, quote-unquote,

12:57and that can amplify the problem significantly.

12:59So one important thing that we all need to be doing,

13:02and I give kudos to Google now for leading that effort,

13:05is making sure that we have a signature or ID inside of any AI-generated content

13:11so that the machines, when they're learning the next time,

13:14they don't pick that up.

13:15They only pick the human content, which is incrementally new, good stuff for them.

13:19And that needs to be embedded in all of the content that we are generating

13:22to avoid falling in that pitfall.

13:25Jed, I wanted to ask you something about this responsibility and these pitfalls.

13:31AI hallucination happens.

13:32Who is responsible?

13:33Is it the people who built the LLM?

13:36Is it the people who use it?

13:37Is it a victimless crime, but somehow not really?

13:42What's the deal there?

13:44You're asking the big question.

13:46I know.

13:49So I see responsibility in that case really as a spectrum

13:53because you can say, for instance, it's just the end user responsibility.

13:58They should be checking all the time the generated content.

14:02For instance, we know and we see all the time, for instance, on general purpose AI systems

14:07that it's always written, you should double check because these LLM, blah, blah, blah,

14:13can fabricate some content.

14:15But we already know that in design, when you see something all the time,

14:19at some point, the human eyes start to just not seeing them anymore.

14:22And so I will be happy to ask all of you, how many of you still see on ChatGPT or

14:29Claude or whatever,

14:30the small sentence saying, oh, you should always double check everything that's generated.

14:37Do you guys see that?

14:39Have you noticed it online saying you're responsible?

14:42Making them smaller and smaller.

14:44It's smaller and smaller, but really it's something that it's known in design

14:47that at some point you just don't see it anymore.

14:50And so I believe we've been talking about education and human oversight.

14:55That's still super important, but it's also important to put the responsibility

15:00or at least the shared responsibility also on the shoulders of developers.

15:04and the industry, we're all responsible because we can't just say, okay, it's all on the users

15:10because once again, and something that I think it's also important to mention is that

15:16hallucinations, as we say, the devil hides in the details.

15:23And so sometimes you can also double check maybe twice the same text and maybe something missed.

15:30Maybe there's this fabricated reference that really sounded like this author that you know by heart

15:37because they already work on the same subject and the same topic, but just that specific paper doesn't exist.

15:44And so it also becomes, for human eyes, very hard to spot them.

15:49And so that's why I think developers should also be really transparent on their technical documentation

15:56and say, okay, especially on this specific topic, for instance, there could be maybe more hallucinations

16:04or, and that's why I was really stressing the importance of evaluation in that case,

16:08because I agree, you, it's for me a false dichotomy to say either you can be innovative,

16:17but, or either you can be responsible.

16:19It's something like responsible AI, it's really something that can foster also your innovation.

16:24And so if you think ahead and from scratch about those stuff, then those same principles,

16:30they don't stay just vague ideas, but they can really be something grounded in your right project.

16:37Yeah, so, so I think the courts have ruled already.

16:39Like, if you are using it in business, it's the responsibility of the business, 100%, right?

16:45So there is a very popular story of, I think it was Air Canada, I'm sorry if I got the

16:49details a bit off,

16:50and Air Canada launched an LLM that was part of their sales process for customer support and so on,

16:56and a customer was able to get an airline ticket that is a business class ticket for $1.

17:01And then Air Canada said, we're sorry, that was a hallucination from our chatbot, we're not going to honor that.

17:06And that customer actually sued them all the way to the Supreme Court in Canada,

17:10and the Supreme Court said, no, it's your responsibility as a business that you made this mistake.

17:15If this was a human and they offered it, then you have to live up to it.

17:18It's not the responsibility of the end customer.

17:22So while in AI consumer chatbots like Claude and Nova, when I'm using it directly, or Gemini,

17:27they can have that footnote and put it back on me,

17:30if now I'm using it as part of my business to do things, then the bar is way, way higher.

17:35I think just circling back to what Jada was saying with regards to responsibility and whose responsibility is it,

17:40and your anecdote really shows where does responsibility lie?

17:43For us, the approach has been about a shared responsibility model,

17:46because I think developers and deployers and end users have really different responsibilities

17:51and can actually do different things as it concerns mitigating these risks.

17:56So for example, as a developer of solutions and also hosting third-party models on our Bedrock platform,

18:02one of the things that we've seen is that with Anthropics model Claude, for example,

18:05if you ask Claude a question and you say, if you are uncertain about it,

18:10indicate that you are uncertain, and interacting with it as if it's a human being,

18:15it actually encourages the model to say, actually, I'm not 100% sure about this data.

18:21And then I think one of the things that I find the most exciting personally with regards to

18:25what's going on in innovation in this space over the past 20 years is the ways in which

18:29we can put, for example, output filters,

18:31or you can do with RAG context grounding, or reference checking,

18:36or guardrails that can mitigate not only for hallucinations,

18:39but also things like harmful content.

18:41What C2PA is doing in terms of watermarking and developing standards for watermarking,

18:46I think is really important because if AWS does watermarking in one way,

18:50Google and Microsoft does in another, Anthropic in another,

18:53then it's going to be really hard for end users to understand what is the ground truth.

18:57And so these kinds of international standards,

19:00we see as playing a really key role in building trust in the technology.

19:03So there's a lot of things that developers can do.

19:05And then for deployers and end users, again, going back to the point on education,

19:09I think this is a really essential factor that's often overlooked.

19:13We look to technology for answers to the challenges of technology,

19:16but at the end of the day, it's people using this technology.

19:19And so we also need to look to the people to advance the science

19:21and also make sure that they're skilled enough to engage critically

19:24with this kind of content and tools.

19:26So you mentioned a few of the solutions that have been deployed

19:29to try to mitigate some of these hallucinations.

19:31I guess RAG is a good example because it's relatively cost efficient, isn't it?

19:35It's very good.

19:37It does help to mitigate things.

19:39But none of these things, as far as I understand it, are fallible, right, Sasha?

19:42I mean, it's not like they...

19:44Infallible, pardon me.

19:45It's not like they are completely able to wipe out any hallucination completely, right?

19:51I mean, that's the problem that we're facing.

19:53Are there any solutions out there that are likely,

19:55even in the future, to be able to achieve that, do you think?

19:58So I think we have to work towards that.

20:01Going back to the introduction in terms of the question of are LLMs fallible,

20:05yes, and do they hallucinate?

20:07Yes.

20:07It goes back to what AI is as a technology,

20:09which is probabilistic and statistical reasoning.

20:13And so working on technologies that can address the gaps and the risks related to that

20:17and complement with human oversight is really important.

20:20Is there any technology right now as a standalone solution that can mitigate hallucinations?

20:25No.

20:25Is it really important that we have a combination of these different approaches,

20:30like guardrails, like output filtering, like RAG and context grounding,

20:35like reference checkers?

20:36And the filtering, I think, is particularly exciting.

20:39These are things that, combined with the human oversight,

20:42mitigate the risks related to hallucinations.

20:45May we get there one day?

20:47I would say as an eternal optimist and someone committed to responsible AI

20:51and the collective research done through open research and open source,

20:55do I think we'll get there one day?

20:57Maybe.

20:57I'm hopeful.

20:58Are we there right now?

20:59No.

21:00Do we need to make sure that human oversight stays a really essential principle

21:03in the ways in which we're deploying these technologies?

21:06100%.

21:08So, the answer is mitigate.

21:11The answer is mitigate.

21:12It's how can we have this technology work hand in hand with the humans

21:17to increase our productivity, right?

21:19So, let's pick something simple like customer support, right?

21:22Which, by the way, this technology is flipping customer support in its head

21:26and call centers everywhere across the world right now.

21:29If I know with very high confidence that this response coming back from the agent

21:34is 99% accurate, the specific response, I don't mean on average,

21:38I mean this response, this answer that's coming back right now is 99% accurate,

21:42then I can have that go back to my end user directly.

21:45And then I want to catch the 1% where I think I'm not sure that is accurate

21:49and then give that to my call center operator, the human, now to deal with it.

21:53If we achieve that alone, then we increase our productivity by 100x, right?

21:59So, that's what we're aiming for as an industry right now,

22:02is how can we have the confidence to know when it can go

22:06and when, no, it needs to be reviewed.

22:08And if we can do that accurately, then we're in a very, very good position.

22:12That's my view on it.

22:14I will highlight a very key dashboard that I would like folks to take a look at.

22:18If you go back later and search for hallucination leaderboard on any search engine,

22:21you will get a leaderboard that we publish at Victara.

22:25It's now becoming the industry standard for hallucination rates of different models

22:28in collaboration with Hugging Face, by the way,

22:30that shows the rates of hallucination of the different models

22:33so you can know going in how likely is it that that model will make up something.

22:38Hamer, can I just follow up with a question on those guardrails and that safety?

22:42I guess from a business perspective,

22:44a lot of people are at the moment working with off-the-shelf AI

22:48that they're trying to deploy to use as best as they can for their businesses.

22:53Within those guardrails, how important...

22:57No, that's not the question.

22:58Realistically, how much do you think people can detect for themselves

23:02whether AI is hallucinating

23:03and how important do you think that will be moving forward as a function?

23:07Extremely. That's the whole point.

23:08I think if there's one thing we'd like the audience to leave from today

23:11is if you're deploying Gen.AI in a mission-critical business application,

23:16you cannot have that without...

23:18We call it guardian agent.

23:20Without a guardian agent that is monitoring the output of these systems

23:24and making sure that the answers are correct and secure.

23:28If you do it without that,

23:30like if you go back to the stats you were sharing earlier,

23:32you were saying 80% are doing it,

23:34but only 20% are correcting for catching it,

23:37like doing the real things to catch it.

23:38That remaining 60%, all of them will fail.

23:41Like literally, they will launch their products

23:42and within a week, they will shut it down

23:44because all it takes is just one bad hallucination

23:48or one bad security leak to shut down a project.

23:51And that's exactly what's happening.

23:53And that's why our voices, we're trying to speak

23:55and scream with the loudest we can,

23:57is if you want to succeed with your Gen.AI,

24:00start with this at the beginning, not at the end.

24:03Would you agree?

24:04A hundred percent.

24:04And I think that one of the points that you raised,

24:07which is absolutely essential,

24:08is that can we go really far?

24:10For example, the bedrock guardrails

24:11that we have for our solutions,

24:13it mitigates the hallucinations

24:14and reduces them by 75%.

24:16It's a huge percentage,

24:18but do we need to go further still

24:19and do we need to make sure

24:20that there's a human in the loop?

24:22A hundred percent.

24:23I think also just,

24:24and you inspired it in the way

24:25in which you framed the conversation,

24:27President Macron said something

24:28that I think is really important

24:29speaking to your productivity question,

24:31which is,

24:31do we need to focus on misuse of the technology?

24:34Yes.

24:35Do we need to advance the research

24:36around safety and responsibility

24:38and collectively commit to that beyond industry?

24:41Yes.

24:42But we also need to focus on

24:43what misuse of this technology represents.

24:45And you evoked a couple of examples

24:47where there are productivity and efficiency gains

24:49that are remarkable,

24:50but also at the end of the day,

24:52it's also about the competitiveness of countries,

24:54of startups that are in France and elsewhere.

24:57And adopting this technology in a responsible way

24:59is actually the key to mitigating those risks

25:01and also making sure that the gains that exist

25:03both economically but also socially are there.

25:06Can I ask about trust?

25:08Because it's been brought up time and time again

25:11on this panel.

25:12I was talking actually to the head of innovation

25:14at Capgemini earlier today

25:16who said to me,

25:17Natasha,

25:17would you get into a driverless car

25:18that says it's 94% good at driving?

25:22And I said,

25:23no.

25:25And he said,

25:25but would you get on a plane?

25:26And I said,

25:27yes.

25:27And he said,

25:27well,

25:27you know,

25:28we've had a horrible news

25:29about a plane crashing today.

25:31Why would you do that?

25:32And I sort of didn't really have an answer to that.

25:35And I thought it was a really good question.

25:37I guess the inherent psychology of trust

25:42is at play here

25:43with Gen AI and with hybrid AI,

25:46with people needing to trust

25:50the tools that they're using.

25:51But finding that trust

25:53and gaining that trust

25:54is intensely damaged by any hallucination.

25:57One hallucination for me

25:58is enough to stop me

25:59from using something.

26:00I don't want to use something

26:01that's going to be mistaken

26:02because it'll make me look stupid.

26:04So how do you think people

26:06are working on gaining that trust?

26:08And do you think that trust

26:09will ever be achieved,

26:12I suppose,

26:12Jada?

26:13Yeah.

26:14Once again,

26:15it's also a complex question,

26:17I think,

26:18because you mentioned trust,

26:20but sometimes

26:20the industry wants faith.

26:24and those are two completely different notions.

26:29Because faith basically comes with no proofs, right?

26:32Like, okay,

26:33I developed something that is the best,

26:35but just trust me, right?

26:37While trust comes with evidence,

26:40and especially when we overcommit

26:43and about our confidence

26:44on how great and how good

26:46like an AI tool can be,

26:48as you say,

26:49once it fails,

26:50then you miss

26:51and you lose

26:52all the trust that you had.

26:54So that's why it's important

26:55to be as transparent as possible

26:58on also the fails.

26:59But once again,

27:00then who wants to build

27:01their businesses

27:02on something

27:03that is not perfect?

27:04But what's really hard

27:06for us humans,

27:07I think,

27:07to grasp

27:08is that

27:09data is not objective,

27:11it's not all the truth

27:13of the world,

27:14and so since

27:15it's statistics,

27:16then it can also make mistakes.

27:18And we've been talking

27:19a lot about

27:20like what's a fact

27:21and what it's not,

27:23but it made me think

27:24about some of the questions

27:25that don't have

27:26only one answer.

27:27Like,

27:28I have a background

27:29in philosophy

27:30and I've never been able

27:32to use LLMs

27:33in philosophy

27:34all the time

27:35that I ask

27:35like complex questions

27:37about some authors

27:38or anything.

27:39It's always so superficial

27:40and once again,

27:42like the most important thing

27:44that I learned

27:45in philosophy

27:45is that

27:46you don't have

27:47one single truth

27:48but like a variety

27:48of truths

27:49and so

27:50then makes

27:51also the use

27:52of LLMs

27:52even more difficult

27:54and so

27:54going back

27:55to the question

27:56of trust,

27:56I think

27:57if we can be

27:59even more honest

28:00and say,

28:01okay,

28:01listen,

28:02for instance,

28:02if you have

28:03a really sensitive context,

28:05maybe it's not

28:06the best

28:07to deploy

28:08this AI tool

28:09like right away

28:10or if you have

28:11to have

28:12like a back it up

28:13with a team

28:14of full humans

28:15who have

28:15100%

28:17the expertise

28:17to assess it

28:19because otherwise,

28:21like as you said,

28:22like end users

28:22would just be like,

28:24no,

28:25I don't trust

28:25this tool

28:26or,

28:27but also the example

28:28that you made,

28:29like would you trust

28:30an autonomous car

28:31who was like

28:3194% safe?

28:33It's the same

28:34if you flip it,

28:35what if I told you,

28:36but you only have

28:376% chance

28:39to fail

28:40or that it will crash

28:42then also

28:43in your perception

28:44it changes

28:45completely

28:46how you perceive it

28:47and that's why

28:48it's important

28:48to not build

28:49faith in your tool

28:51but mostly trust

28:52but gain trust.

28:54Amir,

28:54do you have any?

28:55I agree 100%.

28:56Like Viktara,

28:57our business

28:57is all about that.

28:58Like we're all about

28:59how can we help

29:00businesses

29:01deploying Gen.I

29:02in their environments

29:03to have that trust

29:05where I know

29:06when this answer

29:07is going out

29:07it's the correct answer.

29:08I know when this agent

29:09is performing this action

29:11it's the correct action

29:12and I also understand

29:14that there's no

29:15data leakage

29:16or security issues

29:17happening.

29:17That's very,

29:18very important to trust

29:18and I can track back

29:20which documents,

29:21which data

29:22were behind that.

29:23Hallucinations

29:24is a very big issue

29:25but believe it or not

29:25most companies

29:26the problem is

29:27their data is the issue.

29:28They have bad data

29:29and if you have

29:30bad data

29:30we say garbage in,

29:32garbage out.

29:32You give that bad data

29:33to the large language model

29:34of course

29:35it's going to give you

29:35a wrong answer.

29:36So anyway,

29:37the short of it

29:37absolutely trust

29:38is the anchor

29:39that will make this succeed

29:41but just like

29:41with any new technology

29:42like credit cards

29:43at the beginning

29:44we don't trust them

29:45and then with proper technology

29:46and with proper usage

29:47we don't even think

29:49about it today.

29:49We use credit cards

29:50for everything.

29:51Wear us down.

29:52Yeah,

29:52I love your rubbish in

29:53rubbish out thing

29:53because data sets

29:54is a whole other thing.

29:56I know we have

29:56run out of time

29:57but I want to hear

29:57from Sasha

29:58just any final thoughts

30:00on this point.

30:01Sure,

30:01a couple of really quick points.

30:03first is Jada,

30:04I'm so happy

30:04that there are people

30:05like you

30:05who are philosophers

30:06who are leading

30:07cutting-edge industry

30:09evolutions around this

30:10and I think it's

30:11absolutely essential

30:12and there's another

30:12wonderful French thinker

30:14named Asma Mala

30:15who is a philosopher

30:16and politologue

30:17who I think frames

30:18the question really well

30:20for France and Europe

30:21and beyond

30:21which is we need

30:22to be asking ourselves

30:23what kind of AI

30:24do we want

30:25and actually fundamentally

30:26what do we want

30:26from AI?

30:27that that's the question

30:28that we need to be asking

30:29but as human beings

30:30what we don't understand

30:31we trust.

30:32We don't trust

30:33and so we really need

30:35to focus on understanding

30:36this technology

30:37and what it can and can't do

30:38but also agree

30:39on taxonomies of risk.

30:41Where is there risk

30:42for example

30:42in deploying

30:43in the health sector

30:44where we need to focus

30:45and maybe over-index

30:46on human oversight

30:47and where is there low risk

30:49where we can move quickly

30:50and intentionally

30:51and make sure

30:52that we maintain

30:54I would say

30:54that trust

30:55and part of that

30:55is education

30:56and then lastly

30:57I would just say

30:58that we're really convinced

30:59that responsibility

31:00is actually the key

31:01speaking to your point

31:02because responsibility

31:03will unlock the trust

31:04which will unlock adoption

31:06which will unlock innovation

31:07and so that kind of virtuous cycle

31:10I think is absolutely essential

31:11and we see trust

31:12not only as the starting point

31:13but the middle point

31:14and the end point

31:14it's something that's lost

31:16really quickly

31:16and that needs to be maintained

31:18and it takes everybody

31:19working together

31:19philosophers, scientists,

31:21industry and governments

31:22to think through

31:23what these kinds of frameworks

31:24can look like

31:25at a global scale.

31:26Brilliant

31:27well you've heard it here

31:27please use AI responsibly

31:29thank you so much

31:30to my panel for joining me

31:31thank you everybody

31:32thank you so much for joining me

31:33thank you so much for joining me

31:34thank you so much for joining me

Catégorie

Transcription

Commentaires

Recommandations