Methodology

This page documents how BenchGecko collects, verifies, and presents data across all platform sections.

Data Collection

Benchmark Scores

Primary source: Official model technical reports and model cards published by providers
Cross-verification: EleutherAI lm-evaluation-harness (v0.4+), BigCode HumanEval+
Third-party verification: LMSYS Chatbot Arena, Open LLM Leaderboard, LiveBench
Conflict resolution: Provider-reported scores are used when evaluation settings match standard configurations. Independently reproduced scores are used when settings differ or are unreported.

Score Normalization

All benchmark scores on BenchGecko are min-max normalized to a 0-100 scale. The average score displayed on model profiles is a weighted average across all available benchmarks for that model, with harder benchmarks (lower average scores across all models) receiving higher weight.

A model must have scores on at least 3 benchmarks to receive a ranked average score.

Pricing

Collected from official API pricing pages and direct API responses
Sources include OpenRouter, OpenAI, Anthropic, Google, xAI, DeepSeek, Mistral, and others
Denominated in USD per 1 million tokens (input and output separately)
Updated within 48 hours of announced pricing changes
Historical pricing tracked with daily granularity
Batch API pricing shown separately from synchronous pricing
Free tiers documented but not used for default price comparisons

Full pricing data: benchgecko.ai/pricing

Economy Data

Company financials sourced from public filings, Crunchbase, PitchBook, and press releases
Valuations from latest funding rounds or public market capitalization
Revenue estimates from quarterly reports and analyst consensus
Updated weekly with immediate updates for major funding announcements
All monetary values in USD

Full economy data: benchgecko.ai/economy

Mindshare

Signals aggregated weekly from Reddit, Hacker News, GitHub, arXiv, X, and tech news
Share of voice calculated as percentage of total AI-related mentions
Trend analysis using 7-day rolling windows with basis point changes
Entity classification: models, agents, companies, people, topics

Full mindshare data: benchgecko.ai/mindshare

Compute Infrastructure

Foundry utilization from TSMC quarterly earnings and industry reports
Chip specifications from official datasheets and independent benchmarks
Memory pricing and lead times from SK Hynix, Samsung, and Micron supply chain data
Energy data from utility filings, PPA announcements, and datacenter press releases
Updated weekly

Full compute data: benchgecko.ai/compute

Update Frequency

Data Type	Frequency	Source
Model metadata	Every 2 hours	Provider APIs
Benchmark scores	As published	Technical reports, leaderboards
Pricing	Daily	Provider pricing pages
Economy	Weekly	Public filings, press
Mindshare	Weekly	Social and developer platforms
Compute	Weekly	Industry reports, filings
Changelog	Real-time	All sources

Changelog: benchgecko.ai/changelog

Quality Assurance

Automated range validation (scores within 0-100, prices > 0)
Cross-reference between provider-reported and independently evaluated scores
Monthly full dataset audit of top 50 models by traffic
Automated detection of stale data (> 30 days since last check)
Anomaly detection for sudden score or price changes

Data Sources

10+ authoritative sources, acknowledged at the bottom of every page on BenchGecko:

OpenRouter, Epoch AI, SWE-bench, MCP Registry, Chatbot Arena, HuggingFace, LiveBench, Artificial Analysis, SEAL, Aider.

Citation

Every page on BenchGecko includes a Cite & Share bar with formats: APA, MLA, BibTeX, Chicago, and plain text. Attribution is required on the free API tier.