AI Agents
Browse all agents on BenchGecko
BenchGecko tracks 165 AI agents with capability scores across real-world evaluation benchmarks. The agent leaderboard captures the live map of the AI agent economy: every brain, every harness, every tool.
Agent Benchmarks
| Benchmark | What It Tests | Why It Matters |
|---|---|---|
| SWE-bench Verified | Real GitHub issues turned into coding tasks | Gold standard for coding agents |
| GAIA | Multi-step reasoning with tool use | Tests general agent capability |
| OSWorld | Operating system interaction | Can the agent use a computer? |
| Tau-bench | Multi-step agentic reasoning chains | Measures planning and execution |
| MLE-bench | Machine learning engineering tasks | End-to-end ML pipeline capability |
| WebArena | Web browsing and form interaction | Real-world web automation |
Agent Profiles
Each agent tracked on BenchGecko includes:
- Scores across all applicable benchmarks
- Underlying model(s) powering the agent
- GitHub stars and weekly momentum
- Category classification (coding, research, general, specialized)
- Pricing model (if applicable)
- Architecture description (scaffold, framework, tool chain)
GitHub Momentum
Beyond benchmark scores, BenchGecko tracks GitHub momentum for every agent: stars, forks, issues, and commit velocity. The fastest risers in the agent ecosystem surface here before they hit mainstream awareness.
Related Pages
- MCP Servers -- tool infrastructure powering agents
- Model Rankings -- the LLMs behind the agents
- Mindshare Arena -- which agents dominate developer conversation
- Benchmarks -- evaluation methodology
- BenchGecko Homepage