Test Run #2 Analysis
Comparing model performance for the CONTENT Benchmark benchmark.
Global Filters
Languages
Models
Tags
Overall Avg. Score
0.540
Best Model
GPT O3
Highest Model Score
0.564
Comparing model performance for the CONTENT Benchmark benchmark.
0.540
GPT O3
0.564