LLM Benchmarker

Dashboard New Test Run

Test Run #4 Analysis

Comparing model performance for the GPQA 2026 benchmark.

Global Filters

Languages

Models

Tags

Overall Avg. Score

0.542

Best Model

GPT O3

Highest Model Score

0.634

Model Scores per Language

© 2026 LLM Benchmarker. All rights reserved.