Models

Browse all models on BenchGecko

BenchGecko tracks every major AI model with benchmark scores, pricing, and context windows. As of April 2026, 974 models are tracked across all major providers.

Model Categories

Category	Count	Description
LLM	821	Large language models (text in, text out)
Multimodal	144	Models with vision, audio, or multi-modal capabilities
Image Generation	6	Image generation models
Embedding	3	Text embedding models

Data Points Per Model

Each model profile on BenchGecko includes:

Benchmark scores across up to 128 evaluation suites
Pricing from every provider offering that model (input and output per million tokens)
Context window size in tokens
Release date and version history
Open source status
Provider availability matrix showing which providers offer the model
Price history with daily granularity
Score evolution as new benchmarks are added

Benchmark Coverage

Models are evaluated across these categories:

Category	Key Benchmarks	What It Measures
Knowledge	MMLU, MMLU-Pro	Broad academic knowledge across 57+ subjects
Coding	HumanEval, SWE-bench Verified, MBPP, LiveCodeBench, Aider	Code generation and real-world software engineering
Mathematics	MATH, GSM8K, AIME 2024, AMC 2023	Mathematical reasoning from arithmetic to competition-level
Science	GPQA Diamond	Graduate-level physics, chemistry, and biology
Reasoning	ARC Challenge, HellaSwag, WinoGrande, BBH	Logical and commonsense reasoning
Long Context	RULER, Needle in a Haystack, LongBench v2, GraphWalks BFS	Performance at extended context lengths (128K-1M tokens)
Instruction	IFEval, AlpacaEval 2.0, MT-Bench	Instruction following and alignment
Safety	TruthfulQA, SimpleQA	Factual accuracy and hallucination resistance
Human Preference	Chatbot Arena ELO, Arena Hard	Human preference rankings from blind comparisons

Full benchmark list: benchgecko.ai/benchmarks

Filtering and Views

The model rankings page supports:

Category filters: All, Coding, Reasoning, Vision
Provider filter: Show only models from a specific provider
Open source toggle: Filter to open-weight models only
Price range slider: Set maximum input/output price
Context window minimum: Filter by minimum context length
Benchmark sort: Sort by any specific benchmark score

The Matrix

The Matrix provides a dense spreadsheet view of the top 20 models showing multiple benchmark scores simultaneously. Toggle between All, Coding, Reasoning, and Vision views.

Frontier Race

The homepage features a Frontier Race chart showing the top 10 models by average benchmark score, updated in real time. Track which model leads across coding, reasoning, and vision tasks.

Pricing Comparison -- find the cheapest provider for any model
Compare Tool -- side-by-side model comparison
Provider Directory -- all inference providers
AI Agents -- agent leaderboard using these models
API Access -- retrieve model data programmatically