Methodology
Full methodology on BenchGecko
This page documents how BenchGecko collects, verifies, and presents data across all platform sections.
Data Collection
Benchmark Scores
- Primary source: Official model technical reports and model cards published by providers
- Cross-verification: EleutherAI lm-evaluation-harness (v0.4+), BigCode HumanEval+
- Third-party verification: LMSYS Chatbot Arena, Open LLM Leaderboard, LiveBench
- Conflict resolution: Provider-reported scores are used when evaluation settings match standard configurations. Independently reproduced scores are used when settings differ or are unreported.
Score Normalization
All benchmark scores on BenchGecko are min-max normalized to a 0-100 scale. The average score displayed on model profiles is a weighted average across all available benchmarks for that model, with harder benchmarks (lower average scores across all models) receiving higher weight.
A model must have scores on at least 3 benchmarks to receive a ranked average score.
Pricing
- Collected from official API pricing pages and direct API responses
- Sources include OpenRouter, OpenAI, Anthropic, Google, xAI, DeepSeek, Mistral, and others
- Denominated in USD per 1 million tokens (input and output separately)
- Updated within 48 hours of announced pricing changes
- Historical pricing tracked with daily granularity
- Batch API pricing shown separately from synchronous pricing
- Free tiers documented but not used for default price comparisons
Full pricing data: benchgecko.ai/pricing
Economy Data
- Company financials sourced from public filings, Crunchbase, PitchBook, and press releases
- Valuations from latest funding rounds or public market capitalization
- Revenue estimates from quarterly reports and analyst consensus
- Updated weekly with immediate updates for major funding announcements
- All monetary values in USD
Full economy data: benchgecko.ai/economy
Mindshare
- Signals aggregated weekly from Reddit, Hacker News, GitHub, arXiv, X, and tech news
- Share of voice calculated as percentage of total AI-related mentions
- Trend analysis using 7-day rolling windows with basis point changes
- Entity classification: models, agents, companies, people, topics
Full mindshare data: benchgecko.ai/mindshare
Compute Infrastructure
- Foundry utilization from TSMC quarterly earnings and industry reports
- Chip specifications from official datasheets and independent benchmarks
- Memory pricing and lead times from SK Hynix, Samsung, and Micron supply chain data
- Energy data from utility filings, PPA announcements, and datacenter press releases
- Updated weekly
Full compute data: benchgecko.ai/compute
Update Frequency
| Data Type | Frequency | Source |
|---|---|---|
| Model metadata | Every 2 hours | Provider APIs |
| Benchmark scores | As published | Technical reports, leaderboards |
| Pricing | Daily | Provider pricing pages |
| Economy | Weekly | Public filings, press |
| Mindshare | Weekly | Social and developer platforms |
| Compute | Weekly | Industry reports, filings |
| Changelog | Real-time | All sources |
Changelog: benchgecko.ai/changelog
Quality Assurance
- Automated range validation (scores within 0-100, prices > 0)
- Cross-reference between provider-reported and independently evaluated scores
- Monthly full dataset audit of top 50 models by traffic
- Automated detection of stale data (> 30 days since last check)
- Anomaly detection for sudden score or price changes
Data Sources
10+ authoritative sources, acknowledged at the bottom of every page on BenchGecko:
OpenRouter, Epoch AI, SWE-bench, MCP Registry, Chatbot Arena, HuggingFace, LiveBench, Artificial Analysis, SEAL, Aider.
Citation
Every page on BenchGecko includes a Cite & Share bar with formats: APA, MLA, BibTeX, Chicago, and plain text. Attribution is required on the free API tier.