arena.ai APIarena.ai ↗
Fetch AI model leaderboard rankings from arena.ai across agent, text, image, video, and code arenas. Get ELO scores, confidence intervals, and model metadata.
curl -X GET 'https://api.parse.bot/scraper/dee580cd-1286-483f-818e-02b6459a0d69/get_leaderboard' \ -H 'X-API-Key: $PARSE_API_KEY'
Get the full leaderboard rankings for a specified arena. Returns all ranked models with their scores, confidence intervals, and metadata. The agent arena returns signal-based scores (task outcome, steerability, tool hallucination, etc.), while other arenas return ELO-style ratings with vote counts.
| Param | Type | Description |
|---|---|---|
| arena | string | The arena/modality to fetch rankings for. Accepts exactly one of: agent, text, text-to-image, image-edit, text-to-video, image-to-video, video-edit, code/webdev, code/image-to-webdev. |
{
"type": "object",
"fields": {
"arena": "string",
"models": "array of model ranking objects",
"model_count": "integer",
"last_updated": "string (ISO datetime, agent arena only)",
"total_sessions": "integer (agent arena only)",
"leaderboard_slug": "string (non-agent arenas only)"
},
"sample": {
"data": {
"arena": "agent",
"models": [
{
"rank": 1,
"model": "GPT 5.5 (High)",
"license": "Proprietary",
"sessions": 27140,
"avg_score": {
"ci": 0.0129,
"value": 0.0922,
"pipelines": 5
},
"signal_ci": {
"steerability": 0.0239,
"task_outcome_explicit": 0.023
},
"rank_spread": {
"max": 5,
"min": 1
},
"organization": "OpenAI",
"signal_scores": {
"steerability": 0.0959,
"task_outcome_explicit": 0.0613
}
}
],
"model_count": 20,
"last_updated": "2026-06-08T13:00:00.000Z",
"total_sessions": 463644
},
"status": "success"
}
}About the arena.ai API
The Arena AI Leaderboard API exposes a single get_leaderboard endpoint that returns ranked AI model data across five arenas: agent, text, text-to-image, image-edit, and code. Each response includes per-model scores, confidence intervals, and metadata — with the agent arena also returning signal-based sub-scores for task outcome, steerability, and tool hallucination, plus session-level totals and a last-updated timestamp.
What the API returns
The get_leaderboard endpoint accepts an arena parameter and returns a ranked list of AI models for that modality. The response always includes the arena name, a models array of ranking objects, and a model_count integer. The models array carries per-model scores and confidence intervals, allowing you to compare statistical separation between models rather than just raw rank order.
Agent arena vs. other arenas
The agent arena response includes fields that the other arenas do not: last_updated (ISO datetime), total_sessions (the number of evaluation sessions behind the rankings), and signal-based sub-scores such as task outcome, steerability, and tool hallucination rate. Non-agent arenas — text, text-to-image, image-edit — return ELO-style ratings and expose a leaderboard_slug string instead. These structural differences mean you should branch on the arena field in your response handling.
Supported arenas
The arena input accepts exactly one value per call: agent, text, text-to-image, image-edit, or text (code). There is no batch or multi-arena endpoint; separate calls are needed to compare rankings across modalities. The model_count field lets you quickly confirm how many models are ranked without iterating the full models array.
- Build a model selection dashboard that surfaces the current top-ranked agents and their task-outcome scores from the agent arena.
- Track ELO-style rating changes for text models over time by polling
get_leaderboardwitharena=texton a schedule. - Compare image generation models by pulling text-to-image and image-edit arena rankings side by side using
model_countto normalize comparisons. - Alert engineering teams when a preferred model drops below a threshold rank in any arena, using confidence interval fields to filter noise.
- Populate a live leaderboard UI that shows
total_sessionsandlast_updatedfor the agent arena to signal data freshness to end users. - Audit tool hallucination rates across agent models to shortlist candidates for production agentic workflows.
| Tier | Price | Credits/month | Rate limit |
|---|---|---|---|
| Free | $0/mo | 100 | 5 req/min |
| Hobby | $30/mo | 1,000 | 20 req/min |
| Developer | $100/mo | 5,000 | 250 req/min |
One credit = one API call regardless of which marketplace API you call. Exceeding the rate limit returns a 429 response. Authenticate with the X-API-Key header.
Does arena.ai have an official developer API?+
How does the agent arena response differ from other arenas?+
total_sessions count, and a last_updated ISO datetime. Non-agent arenas return ELO-style ratings and a leaderboard_slug field instead. Both include arena, models, and model_count.Can I retrieve historical leaderboard rankings or track rank changes over time?+
Are video or code arenas available as separate arena values?+
arena parameter currently accepts agent, text, text-to-image, and image-edit. A distinct video or code arena value is not currently exposed. You can fork this API on Parse and revise it to add support for additional arena slugs if they become available on arena.ai.What does the `models` array contain for each ranked model?+
models includes the model's rank, score, and confidence interval bounds. The agent arena entries also carry the individual signal-based sub-scores. Exact field names per model depend on the arena; the agent arena entries are the most granular.