Skip to player
Skip to main content
Skip to footer
搜索
Connect
查看全屏
按讚
书签
分享
添加到播放列表
举报
Doctor Bot vs. Human MD The 85% Accuracy Showdown That'll Make You Question Your GP!
Frank Hu
关注
2025/7/12
Discuss Topic: "Should we trust AI doctors more than human physicians when they're 4x more accurate but lack a heartbeat?"
Opinion A:
"Hell yes! My WebMD self-diagnosis era is over!"
Opinion B:
"Nope - I want my doctor to sweat when delivering bad news!"
Opinion A or B? It is your decision and turn now!
类别
🤖
科技
文字稿
显示完整文字稿
00:00
that Microsoft is announcing today, which is that you've created, and let me know if I get this right,
00:04
a diagnostician bot that effectively will be able to dialogue with a patient's case file
00:10
and then make a diagnosis. So it's actually two bots, one, and it's a system. So it's not just
00:16
a Microsoft's bots, but it can be on any bot where one bot basically acts as a gatekeeper to all
00:22
a patient's medical information, and then the other one is basically acting as the diagnostician
00:26
or the physician that goes in and asks questions about that history. And you found some pretty
00:32
incredible results when it comes to the effectiveness of this system to be able to diagnose
00:38
correctly. Yeah, it's a great summary. That's exactly right. We essentially wanted to simulate
00:44
what it would be like for an AI to act as a diagnostician, to ask the patient a series of
00:52
questions, to draw out their case history, go through a whole bunch of tests that they may
00:58
have had, pathology and radiology, and then iteratively examine the information that it's getting in order
01:05
to improve the accuracy and reliability of its prediction about what your diagnosis actually
01:11
is. And we actually use the New England Journal of Medicine case histories, hundreds of these past
01:19
cases. One of these cases comes out every single week, and it's like an ultimate crossword for
01:24
doctors. They obviously don't see the answer until the following week, and it's a big guessing game
01:31
to go back through five to seven pages of very detailed history, and then try and figure out what
01:36
the diagnosis actually turns out to be. Okay. And so what happens is these two bots work together in
01:43
conjunction to figure out what the diagnosis is? Why use a system like this? I mean, I thought one
01:48
of the benefits of generative AI is it can sort of take in a lot of information and then come to these
01:54
answers, sometimes in one shot. So what is the benefit of having these, this dialogue between two
01:59
bots? So the big breakthrough of the last six months or so in AI is these thinking or reasoning models
02:05
that can obviously query other agents or find other information sources at inference time to improve
02:14
the quality of its response. Rather than just giving the first best answer, it instead goes and, you know,
02:21
consults a range of different sources, and that improves the quality of the information that it finally
02:27
gets to. So we see that this orchestrator, which under the hood uses four different models from the major
02:32
providers, can actually improve the accuracy of each of the individual models and collectively all of
02:39
them together by a very significant degree, about 10% or so. So it's a big step forward. And I think that
02:45
as the AI models get commoditized, you know, really all the value will be added in that final layer of
02:52
orchestration, product integration. And that's what we're seeing with this diagnostic orchestrator.
02:57
So a 10% increase in accurately diagnosing on top of the standard LLMs.
03:06
Yeah. And in fact, we actually benchmark that against human performance. So we had a whole bunch
03:12
of expert physicians play this simulated diagnosis environment game, and they on average get about
03:20
one in five, right? So about 20%. Whereas our orchestrator gets about 85% accuracy. So it's four
03:28
times more accurate, which, you know, in like my career, I've never seen such a big gulf between
03:36
human level performance and the AI systems performance. Many years ago, I worked on like lots
03:43
of diagnoses for radiology and head and neck cancer and mammography. And the goal was just to take a
03:51
single radiology exam and predict, you know, yes or no, does it have cancer? And that was the most we
03:56
could do. Whereas now, it is not just producing a binary class output, but it's actually producing a
04:03
very detailed diagnosis and getting and doing that sequential sequentially through this interactive
04:10
dialogue mechanism. And so that massively improves the accuracy.
04:14
Okay, so it can do 80% accurate diagnoses, which sounds incredible. And I have to pressure test this
04:19
a little bit. Because what if you have the same thing happen to medicine, as is happening with
04:25
beginner level code, where basically, there are people who are learning to code using these co pilots,
04:31
but then when something breaks, it becomes harder for them to figure out what's going on. So if you're a
04:35
doctor relying on something amazing, 80% accuracy, but if you don't have if you sort of outsource some
04:42
of your thinking to these bots, is that a problem down the line?
04:46
Yeah, so this isn't just giving a black box answer. That's why the sequential diagnosis part is so
04:52
important. Because you can watch the AI in real time, ask questions of the case history, get an answer,
04:59
shape a new question, get an answer, present a new question, then present then then ask for a
05:05
different type of testing, get those results, interpret it, then give an answer. So the dialogic
05:11
nature means that a human doctor can follow along and actually learn in a very transparent way. It's
05:17
almost like having an interpretability mechanism inside the black box of the LLM, because you can see
05:23
its thinking process in real time. And in fact, you don't just see the sort of chain of thoughts,
05:28
which is the, you know, in a monologue, we've actually created five different types of agent,
05:35
which all have a debate. And we call this chain of debate, they negotiate with one another, they try to
05:40
prioritize, you know, certain different aspects like cost or efficiency. And it's the coordination
05:47
of those different skill sets among the doctors, which actually met doctor agents is actually what
05:52
makes this so effective.
05:54
But I want to ask again, because even if a doctor can watch this goal take place, it effectively
06:01
turns their role and let's say this becomes something that doctors use, it turns their role
06:06
and diagnoses to from something that's active and really thinking through to more of like a passive,
06:11
okay, I'm watching the bots go through it. And I do wonder if there is some benefit in having the
06:18
doctor actually have to do that themselves, because it helps the brain work in ways that it doesn't
06:24
when it's just watching bots have a conversation.
06:27
Yeah, I mean, I think that's totally true. I just still think this is going to be an amazing
06:30
education tool for doctors to actually learn about the breadth of cases they never would have
06:35
encountered. For example, we actually ran the the DXO orchestrator last week on the most recent case
06:41
study in the New England Journal of Medicine. And it correctly diagnosed, diagnosed the case that had
06:47
only ever been seen 1500 times in all of medical literature, it was such an obscure long tail
06:52
disease. So very few doctors are ever going to get the chance to see that. And so the ability to
06:59
accurately and preventably detect these kinds of conditions in the wild in production, I think will
07:05
massively outweigh, you know, the risk of doctors not being able to sort of exercise in the way that
07:12
you describe, I think the tools just change how you work. And, you know, obviously, everyone will
07:17
sort of have to adapt to that over time. But the utility is just so unquestionably beneficial that I
07:22
think it, it makes it worthwhile.
07:25
Now, is it able to do that? Because the cases are potentially in the training data? And even if they
07:32
are, does it really matter? I mean, it is if it is able to diagnose these rare conditions?
07:36
Should we really mind if it's seen it before in the training data?
07:42
Well, part of the reason why we partnered with the New England Journal of Medicine is because each
07:46
week, they put out a brand new case, which has never even been digitized. So there's no question
07:52
that it's not in the training data. This case, for example, from last week, there's absolutely no way
07:57
it's in the training data, because it's literally just got published. So and you know, we think that's
08:01
the case going back for all of the previous studies to cases too. So I don't think there's
08:06
any chance of that. This really is doing a kind of a sort of abstraction of judgment. It's not just
08:14
reproducing training data, it is actually doing some kind of inference or thinking based on the
08:19
knowledge that it does already have.
推荐视频
4:03
|
接下来播放
AI will Replace Doctors? | IT ஊழியர்களை தாண்டி இப்போ Doctors கதறல் | Oneindia Tamil
Oneindia Tamil
2025/7/4
3:43
Is AI smarter than your doctor?
DW (English)
2025/7/8
1:24
AI battled doctors in a live showdown to diagnose patients. Who came out on top?
euronews (in English)
2025/7/29
59:17
Horizon - Diagnosis on Demand
MediaEntertainment
2023/5/8
1:30
Meet Dr Hudsson: The robot medical professional
euronews (in English)
2021/3/30
11:48
🤖 The Most Advanced AI Nurse Robot with NVIDIA Brain Is Already Treating Real Patients! 🏥🧠 | AI Revolution
Ai Revolution
2025/5/28
1:59
AI for Good: Revolutionizing Medicine with Artificial Intelligence
Entertainment Hub
2024/9/18
17:43
Doctor Reacts To ER | Medical Drama Review | Doctor Mike
Most Viewed fi
2018/9/9
2:16
AI Predicting Heart Failure Before It Happens? HFP AI Tool That ‘Detects Heart Failure’
Oneindia
2025/7/31
0:35
Trust Me, I'm a Doctor | show | 2013 | Official Trailer
JustWatch
2023/2/1
6:07
AI can spot 'invisible' brain abnormalities in patients
The Manila Times
2025/3/11
9:04
This New Deep Learning AI Outperforms Humans at Saving Lives | AI Revolution
Ai Revolution
2025/4/9
3:54
Dr. Ian Smith
CW7 Arizona
2020/12/1
2:50
When Doctors Brush You Off Because They Don't Know How to Treat You
Otto
2019/10/12
0:21
AI's Existential Crisis When Robots Start Grading Their Own Homework!
Frank Hu
2025/7/15
2:07
The Universe is Just a LEGO Set: How to Build Intelligence with Kaleidoscope Glasses
Frank Hu
2025/7/14
4:20
Sex Bots 2025 When AI Becomes Your New Roommate (And Your Only Date)
Frank Hu
2025/7/14
0:30
Apps Pay the Bills: Why Tech's Flashy Toys Need Us to Keep the Lights On?
Frank Hu
2025/7/8
2:10
Zuckerberg vs. Altman The Billion-Dollar AI Talent Heist – Who's Stealing Whose Brainiacs
Frank Hu
2025/7/8
1:14
Neuralink's Eyeball Upgrade? Musk Promises 20/20 Cyborg Vision by 2026!
Frank Hu
2025/6/25
6:00
Tom & Jerry in The Fast and The Furry(4/11)
Frank Hu
2025/3/7
6:00
Tom & Jerry in The Fast and The Furry(3/11)
Frank Hu
2025/3/7
6:00
Tom & Jerry in The Fast and The Furry(2/11)
Frank Hu
2025/3/7
5:44
Tom & Jerry in The Fast and The Furry(1/11)
Frank Hu
2025/3/7
0:59
Forget Dragon Drinking Water—Behold the Cat King’s Magic Sip!
Frank Hu
2025/2/14