Table 1: Accuracy of Different Models Across Subfields
Abbreviations: OP = Optics; AMS = Atomic, Molecular, and Subatomic Physics; ME = Mechanics; SP = Solid Physics and Measurement; TH = Thermodynamics and Statistical Physics; EM = Electromagnetism and Electrodynamics; RE = Relativity; QM = Quantum Mechanics.
Models | Overall | OP | AMS | ME | SP | TH | EM | RE | QM |
---|---|---|---|---|---|---|---|---|---|
Multi-choice Questions (MCQs) | |||||||||
GPT-4o | 33.7 | 42.9 | 39.2 | 40.1 | 33.3 | 32.8 | 35.4 | 41.2 | 36.2 |
Claude-3.5-Sonnet | 36.5 | 43.2 | 50.9 | 41.3 | 32.1 | 44.0 | 35.8 | 45.5 | 44.9 |
Qwen2.5-VL-72B | 33.4 | 31.9 | 32.9 | 40.8 | 23.9 | 26.9 | 29.8 | 38.3 | 33.9 |
Gemini-2.5-Pro | 26.5 | 27.8 | 29.7 | 26.6 | 25.5 | 25.0 | 24.7 | 24.3 | 35.9 |
GPT-o4-mini | 36.7 | 57.9 | 55.3 | 42.1 | 31.2 | 24.7 | 41.2 | 30.9 | 35.9 |
InternVL-3-38B | 33.6 | 41.3 | 41.9 | 37.6 | 21.6 | 26.6 | 29.2 | 12.1 | 32.1 |
Open-Ended Questions (OEQs) | |||||||||
GPT-4o | 20.9 | 30.5 | 24.1 | 27.8 | 5.0 | 3.4 | 20.2 | 2.1 | 6.2 |
Claude-3.5-Sonnet | 19.0 | 37.6 | 28.5 | 26.2 | 8.0 | 4.8 | 17.9 | 2.1 | 0.0 |
Qwen2.5-VL-72B | 23.7 | 38.5 | 29.1 | 29.5 | 9.0 | 5.5 | 21.4 | 2.1 | 0.0 |
Gemini-2.5-Pro | 25.5 | 49.3 | 31.0 | 35.2 | 6.0 | 4.1 | 23.7 | 2.1 | 0.0 |
GPT-o4-mini | 26.5 | 51.2 | 31.0 | 38.2 | 10.0 | 6.2 | 28.4 | 2.1 | 0.0 |
InternVL-3-38B | 17.7 | 27.4 | 17.9 | 21.0 | 5.5 | 3.4 | 19.3 | 2.1 | 0.0 |