Benchmarks
GPQA 2026
Official General-Purpose Question Answering benchmark for 2026.
BLOCKCHAINS Benchmark
Aequitas incidunt cubo carmen.
LIFETIME VALUE Benchmark
Brevis cruentus fugit turba aduro benevolentia caute dolorum numquam.
INTERFACES Benchmark
Texo currus textilis somnus uredo derelinquo laboriosam tum vilitas.
INFRASTRUCTURES Benchmark
Cursus curto complectus comburo alienus.
Recent Test Runs
Test Run #1 - INTERFACES Benchmark
Models: Llama 4, Gemini 2.5 Pro, GPT O3, Claude 4 Sonnet | Languages: es, fr, en
FAILED18 days ago
Test Run #2 - LIFETIME VALUE Benchmark
Models: Claude 4 Sonnet, GPT O3 | Languages: de, it, fr, es
COMPLETEDyesterday
Test Run #3 - INFRASTRUCTURES Benchmark
Models: Gemini 2.5 Pro, Claude 4 Sonnet, GPT O3 | Languages: de, it
RUNNING12 days ago
Test Run #4 - INFRASTRUCTURES Benchmark
Models: GPT O3, Claude 4 Sonnet, Gemini 2.5 Pro | Languages: es, de, en, fr, it
FAILED15 days ago
Test Run #5 - GPQA 2026
Models: Gemini 2.5 Pro, Claude 4 Sonnet, Llama 4, GPT O3 | Languages: fr, en, it, es, de
RUNNING16 days ago
Test Run #6 - GPQA 2026
Models: Llama 4, Claude 4 Sonnet | Languages: es, fr, it, en, de
RUNNING20 days ago
Test Run #7 - BLOCKCHAINS Benchmark
Models: Llama 4, Gemini 2.5 Pro | Languages: de, en, it
COMPLETED20 days ago
Test Run #8 - INFRASTRUCTURES Benchmark
Models: Llama 4, GPT O3 | Languages: de, it, en, fr
FAILEDyesterday
Test Run #9 - INFRASTRUCTURES Benchmark
Models: GPT O3, Llama 4, Gemini 2.5 Pro | Languages: de, es, fr, en, it
COMPLETED25 days ago
Test Run #10 - LIFETIME VALUE Benchmark
Models: Claude 4 Sonnet, Gemini 2.5 Pro, Llama 4 | Languages: fr, es, de, en, it
COMPLETED27 days ago