August, 2024
August 2024
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
Which AI does better at USMLE
Jul 22, 2024, 15:48

Which AI does better at USMLE

Scott Gottlieb, Partner at New Enterprise Associates, shared on X:

We fed questions from the USMLE Step 3 medical licensing exam to the top 5 LLMs — Google Gemini, ChatGTP, Claude, Llama, and Grok. We wanted to see which LLM has the best medical aptitude. This is how they did.

Here’s how they scored:

  • ChatGPT-4o (OpenAI) — 49/50 questions correct (98%)
  • Claude 3.5 (Anthropic) — 45/50 (90%)
  • Gemini Advanced (Google) — 43/50 (86%)
  • Grok (xAI) — 42/50 (84%)
  • HuggingChat (Llama) — 33/50 (66%)

Source: Scott Gottlieb/X