Skip to content

Methodology

Full methodology on BenchGecko

This page documents how BenchGecko collects, verifies, and presents data across all platform sections.

Data Collection

Benchmark Scores

  1. Primary source: Official model technical reports and model cards published by providers
  2. Cross-verification: EleutherAI lm-evaluation-harness (v0.4+), BigCode HumanEval+
  3. Third-party verification: LMSYS Chatbot Arena, Open LLM Leaderboard, LiveBench
  4. Conflict resolution: Provider-reported scores are used when evaluation settings match standard configurations. Independently reproduced scores are used when settings differ or are unreported.

Score Normalization

All benchmark scores on BenchGecko are min-max normalized to a 0-100 scale. The average score displayed on model profiles is a weighted average across all available benchmarks for that model, with harder benchmarks (lower average scores across all models) receiving higher weight.

A model must have scores on at least 3 benchmarks to receive a ranked average score.

Pricing

  1. Collected from official API pricing pages and direct API responses
  2. Sources include OpenRouter, OpenAI, Anthropic, Google, xAI, DeepSeek, Mistral, and others
  3. Denominated in USD per 1 million tokens (input and output separately)
  4. Updated within 48 hours of announced pricing changes
  5. Historical pricing tracked with daily granularity
  6. Batch API pricing shown separately from synchronous pricing
  7. Free tiers documented but not used for default price comparisons

Full pricing data: benchgecko.ai/pricing

Economy Data

  1. Company financials sourced from public filings, Crunchbase, PitchBook, and press releases
  2. Valuations from latest funding rounds or public market capitalization
  3. Revenue estimates from quarterly reports and analyst consensus
  4. Updated weekly with immediate updates for major funding announcements
  5. All monetary values in USD

Full economy data: benchgecko.ai/economy

Mindshare

  1. Signals aggregated weekly from Reddit, Hacker News, GitHub, arXiv, X, and tech news
  2. Share of voice calculated as percentage of total AI-related mentions
  3. Trend analysis using 7-day rolling windows with basis point changes
  4. Entity classification: models, agents, companies, people, topics

Full mindshare data: benchgecko.ai/mindshare

Compute Infrastructure

  1. Foundry utilization from TSMC quarterly earnings and industry reports
  2. Chip specifications from official datasheets and independent benchmarks
  3. Memory pricing and lead times from SK Hynix, Samsung, and Micron supply chain data
  4. Energy data from utility filings, PPA announcements, and datacenter press releases
  5. Updated weekly

Full compute data: benchgecko.ai/compute

Update Frequency

Data Type Frequency Source
Model metadata Every 2 hours Provider APIs
Benchmark scores As published Technical reports, leaderboards
Pricing Daily Provider pricing pages
Economy Weekly Public filings, press
Mindshare Weekly Social and developer platforms
Compute Weekly Industry reports, filings
Changelog Real-time All sources

Changelog: benchgecko.ai/changelog

Quality Assurance

  • Automated range validation (scores within 0-100, prices > 0)
  • Cross-reference between provider-reported and independently evaluated scores
  • Monthly full dataset audit of top 50 models by traffic
  • Automated detection of stale data (> 30 days since last check)
  • Anomaly detection for sudden score or price changes

Data Sources

10+ authoritative sources, acknowledged at the bottom of every page on BenchGecko:

OpenRouter, Epoch AI, SWE-bench, MCP Registry, Chatbot Arena, HuggingFace, LiveBench, Artificial Analysis, SEAL, Aider.

Citation

Every page on BenchGecko includes a Cite & Share bar with formats: APA, MLA, BibTeX, Chicago, and plain text. Attribution is required on the free API tier.