November, 2024
November 2024
M T W T F S S
 123
45678910
11121314151617
18192021222324
252627282930  
Which AI does better at USMLE
Jul 22, 2024, 15:48

Which AI does better at USMLE

Scott Gottlieb, Partner at New Enterprise Associates, shared on X:

We fed questions from the USMLE Step 3 medical licensing exam to the top 5 LLMs — Google Gemini, ChatGTP, Claude, Llama, and Grok. We wanted to see which LLM has the best medical aptitude. This is how they did.

Here’s how they scored:

  • ChatGPT-4o (OpenAI) — 49/50 questions correct (98%)
  • Claude 3.5 (Anthropic) — 45/50 (90%)
  • Gemini Advanced (Google) — 43/50 (86%)
  • Grok (xAI) — 42/50 (84%)
  • HuggingChat (Llama) — 33/50 (66%)

Source: Scott Gottlieb/X