Test Run #6 Analysis
Comparing model performance for the GPQA 2026 benchmark.
Global Filters
Languages
Models
Tags
Overall Avg. Score
0.415
Best Model
GPT O3
Highest Model Score
0.457
Comparing model performance for the GPQA 2026 benchmark.
0.415
GPT O3
0.457