Jul 22, 2024, 15:48
Which AI does better at USMLE
Scott Gottlieb, Partner at New Enterprise Associates, shared on X:
“We fed questions from the USMLE Step 3 medical licensing exam to the top 5 LLMs — Google Gemini, ChatGTP, Claude, Llama, and Grok. We wanted to see which LLM has the best medical aptitude. This is how they did.”
Here’s how they scored:
- ChatGPT-4o (OpenAI) — 49/50 questions correct (98%)
- Claude 3.5 (Anthropic) — 45/50 (90%)
- Gemini Advanced (Google) — 43/50 (86%)
- Grok (xAI) — 42/50 (84%)
- HuggingChat (Llama) — 33/50 (66%)
Source: Scott Gottlieb/X