Test Run #2 Analysis
Comparing model performance for the E-COMMERCE Benchmark benchmark.
Global Filters
Languages
Models
Tags
Overall Avg. Score
0.470
Best Model
GPT O3
Highest Model Score
0.479
Comparing model performance for the E-COMMERCE Benchmark benchmark.
0.470
GPT O3
0.479