Test Run #4 Analysis

Comparing model performance for the EXPERIENCES Benchmark benchmark.

Global Filters

Languages

Models

Tags

Overall Avg. Score

0.541

Best Model

Llama 4

Highest Model Score

0.541

Model Scores per Language

© 2025 LLM Benchmarker. All rights reserved.