Medical Coding Exam Questions

I tested GPT-5.2 and the AI model's mixed results raise tough questions

Subjected to my battery of 10 text tests and 4 image challenges, OpenAI's latest model barely edged out GPT-5.1. What are Plus subscribers actually paying for?

Microsoft

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

I tested GPT-5.2 and the AI model's mixed results raise tough questions

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Trending now