Skip to content

Models

Browse all models on BenchGecko

BenchGecko tracks every major AI model with benchmark scores, pricing, and context windows. As of April 2026, 974 models are tracked across all major providers.

Model Categories

Category Count Description
LLM 821 Large language models (text in, text out)
Multimodal 144 Models with vision, audio, or multi-modal capabilities
Image Generation 6 Image generation models
Embedding 3 Text embedding models

Data Points Per Model

Each model profile on BenchGecko includes:

  • Benchmark scores across up to 128 evaluation suites
  • Pricing from every provider offering that model (input and output per million tokens)
  • Context window size in tokens
  • Release date and version history
  • Open source status
  • Provider availability matrix showing which providers offer the model
  • Price history with daily granularity
  • Score evolution as new benchmarks are added

Benchmark Coverage

Models are evaluated across these categories:

Category Key Benchmarks What It Measures
Knowledge MMLU, MMLU-Pro Broad academic knowledge across 57+ subjects
Coding HumanEval, SWE-bench Verified, MBPP, LiveCodeBench, Aider Code generation and real-world software engineering
Mathematics MATH, GSM8K, AIME 2024, AMC 2023 Mathematical reasoning from arithmetic to competition-level
Science GPQA Diamond Graduate-level physics, chemistry, and biology
Reasoning ARC Challenge, HellaSwag, WinoGrande, BBH Logical and commonsense reasoning
Long Context RULER, Needle in a Haystack, LongBench v2, GraphWalks BFS Performance at extended context lengths (128K-1M tokens)
Instruction IFEval, AlpacaEval 2.0, MT-Bench Instruction following and alignment
Safety TruthfulQA, SimpleQA Factual accuracy and hallucination resistance
Human Preference Chatbot Arena ELO, Arena Hard Human preference rankings from blind comparisons

Full benchmark list: benchgecko.ai/benchmarks

Filtering and Views

The model rankings page supports:

  • Category filters: All, Coding, Reasoning, Vision
  • Provider filter: Show only models from a specific provider
  • Open source toggle: Filter to open-weight models only
  • Price range slider: Set maximum input/output price
  • Context window minimum: Filter by minimum context length
  • Benchmark sort: Sort by any specific benchmark score

The Matrix

The Matrix provides a dense spreadsheet view of the top 20 models showing multiple benchmark scores simultaneously. Toggle between All, Coding, Reasoning, and Vision views.

Frontier Race

The homepage features a Frontier Race chart showing the top 10 models by average benchmark score, updated in real time. Track which model leads across coding, reasoning, and vision tasks.