Skip to content

BenchGecko Documentation

Agents

AI Agents

Browse all agents on BenchGecko

BenchGecko tracks 165 AI agents with capability scores across real-world evaluation benchmarks. The agent leaderboard captures the live map of the AI agent economy: every brain, every harness, every tool.

Agent Benchmarks

Benchmark	What It Tests	Why It Matters
SWE-bench Verified	Real GitHub issues turned into coding tasks	Gold standard for coding agents
GAIA	Multi-step reasoning with tool use	Tests general agent capability
OSWorld	Operating system interaction	Can the agent use a computer?
Tau-bench	Multi-step agentic reasoning chains	Measures planning and execution
MLE-bench	Machine learning engineering tasks	End-to-end ML pipeline capability
WebArena	Web browsing and form interaction	Real-world web automation

Agent Profiles

Each agent tracked on BenchGecko includes:

Scores across all applicable benchmarks
Underlying model(s) powering the agent
GitHub stars and weekly momentum
Category classification (coding, research, general, specialized)
Pricing model (if applicable)
Architecture description (scaffold, framework, tool chain)

GitHub Momentum

Beyond benchmark scores, BenchGecko tracks GitHub momentum for every agent: stars, forks, issues, and commit velocity. The fastest risers in the agent ecosystem surface here before they hit mainstream awareness.

MCP Servers -- tool infrastructure powering agents
Model Rankings -- the LLMs behind the agents
Mindshare Arena -- which agents dominate developer conversation
Benchmarks -- evaluation methodology
BenchGecko Homepage