Skip to content

AI Agents

Browse all agents on BenchGecko

BenchGecko tracks 165 AI agents with capability scores across real-world evaluation benchmarks. The agent leaderboard captures the live map of the AI agent economy: every brain, every harness, every tool.

Agent Benchmarks

Benchmark What It Tests Why It Matters
SWE-bench Verified Real GitHub issues turned into coding tasks Gold standard for coding agents
GAIA Multi-step reasoning with tool use Tests general agent capability
OSWorld Operating system interaction Can the agent use a computer?
Tau-bench Multi-step agentic reasoning chains Measures planning and execution
MLE-bench Machine learning engineering tasks End-to-end ML pipeline capability
WebArena Web browsing and form interaction Real-world web automation

Agent Profiles

Each agent tracked on BenchGecko includes:

  • Scores across all applicable benchmarks
  • Underlying model(s) powering the agent
  • GitHub stars and weekly momentum
  • Category classification (coding, research, general, specialized)
  • Pricing model (if applicable)
  • Architecture description (scaffold, framework, tool chain)

GitHub Momentum

Beyond benchmark scores, BenchGecko tracks GitHub momentum for every agent: stars, forks, issues, and commit velocity. The fastest risers in the agent ecosystem surface here before they hit mainstream awareness.