Roupen Odabashian: The Doctors Were Fine. The AI Was Excellent
Roupen Odabashian/LinkedIn

Roupen Odabashian: The Doctors Were Fine. The AI Was Excellent

Roupen Odabashian, Hematology/Oncology Fellow at the Karmanos Cancer Institute, shared a post on LinkedIn:

“The most important AI-in-medicine result of the last two years is the one almost nobody wants to repeat. Giving doctors a frontier model did not make them better diagnosticians.

In a randomized trial, physicians using GPT-4 plus standard resources scored 76% on diagnostic reasoning. The control group, using UpToDate and Google, scored 74%. Statistically a tie.

Here’s the twist that should keep us up at night. The model alone, with no doctor attached, scored about 16 points higher than the physicians it was supposed to be helping.

So the AI was excellent. The doctors were fine. And the combination added almost nothing. The bottleneck wasn’t the model’s intelligence. It was the human-AI interface. We glance at the output, anchor on our first impression, and use the tool to confirm rather than to think.

That reframes the whole problem. We’ve spent two years racing to build models smart enough for medicine. This says the harder, less glamorous work is teaching clinicians how to actually use one. Capability was never the gap. Adoption behavior is.

What would it take to train a doctor who gets more than 2 points out of a system that’s already better than them alone?”

Title: Large Language Model Influence on Diagnostic Reasoning

Authors: Ethan Goh, Robert Gallo, Jason Hom, Eric Strong, Yingjie Weng, Hannah Kerman, Joséphine A. Cool, Zahir Kanjee, Andrew S. Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew P. J. Olson, Adam Rodman, Jonathan H. Chen

Read the Article

Roupen Odabashian

Other articles featuring Roupen Odabashian on OncoDaily.