Which AI does better at USMLE

Scott Gottlieb, Partner at New Enterprise Associates, shared on X:

“We fed questions from the USMLE Step 3 medical licensing exam to the top 5 LLMs — Google Gemini, ChatGTP, Claude, Llama, and Grok. We wanted to see which LLM has the best medical aptitude. This is how they did.”

Here’s how they scored:

ChatGPT-4o (OpenAI) — 49/50 questions correct (98%)
Claude 3.5 (Anthropic) — 45/50 (90%)
Gemini Advanced (Google) — 43/50 (86%)
Grok (xAI) — 42/50 (84%)
HuggingChat (Llama) — 33/50 (66%)

Source: Scott Gottlieb/X

Blog Posts

cancer chatgtp Claude Google Gemini Grok Llama New Enterprise Associates OncoDaily Oncology Scott Gottlieb USMLE

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Which AI does better at USMLE

European School of Oncology

Sitemap

Hemostasis Today

Fertility News

Oncodaily Journal