Models
Browse all models on BenchGecko
BenchGecko tracks every major AI model with benchmark scores, pricing, and context windows. As of April 2026, 974 models are tracked across all major providers.
Model Categories
| Category | Count | Description |
|---|---|---|
| LLM | 821 | Large language models (text in, text out) |
| Multimodal | 144 | Models with vision, audio, or multi-modal capabilities |
| Image Generation | 6 | Image generation models |
| Embedding | 3 | Text embedding models |
Data Points Per Model
Each model profile on BenchGecko includes:
- Benchmark scores across up to 128 evaluation suites
- Pricing from every provider offering that model (input and output per million tokens)
- Context window size in tokens
- Release date and version history
- Open source status
- Provider availability matrix showing which providers offer the model
- Price history with daily granularity
- Score evolution as new benchmarks are added
Benchmark Coverage
Models are evaluated across these categories:
| Category | Key Benchmarks | What It Measures |
|---|---|---|
| Knowledge | MMLU, MMLU-Pro | Broad academic knowledge across 57+ subjects |
| Coding | HumanEval, SWE-bench Verified, MBPP, LiveCodeBench, Aider | Code generation and real-world software engineering |
| Mathematics | MATH, GSM8K, AIME 2024, AMC 2023 | Mathematical reasoning from arithmetic to competition-level |
| Science | GPQA Diamond | Graduate-level physics, chemistry, and biology |
| Reasoning | ARC Challenge, HellaSwag, WinoGrande, BBH | Logical and commonsense reasoning |
| Long Context | RULER, Needle in a Haystack, LongBench v2, GraphWalks BFS | Performance at extended context lengths (128K-1M tokens) |
| Instruction | IFEval, AlpacaEval 2.0, MT-Bench | Instruction following and alignment |
| Safety | TruthfulQA, SimpleQA | Factual accuracy and hallucination resistance |
| Human Preference | Chatbot Arena ELO, Arena Hard | Human preference rankings from blind comparisons |
Full benchmark list: benchgecko.ai/benchmarks
Filtering and Views
The model rankings page supports:
- Category filters: All, Coding, Reasoning, Vision
- Provider filter: Show only models from a specific provider
- Open source toggle: Filter to open-weight models only
- Price range slider: Set maximum input/output price
- Context window minimum: Filter by minimum context length
- Benchmark sort: Sort by any specific benchmark score
The Matrix
The Matrix provides a dense spreadsheet view of the top 20 models showing multiple benchmark scores simultaneously. Toggle between All, Coding, Reasoning, and Vision views.
Frontier Race
The homepage features a Frontier Race chart showing the top 10 models by average benchmark score, updated in real time. Track which model leads across coding, reasoning, and vision tasks.
Related Pages
- Pricing Comparison -- find the cheapest provider for any model
- Compare Tool -- side-by-side model comparison
- Provider Directory -- all inference providers
- AI Agents -- agent leaderboard using these models
- API Access -- retrieve model data programmatically