epoch.ai APIepoch.ai ↗
Retrieve FrontierMath benchmark scores for 90+ AI models across Tier 4 and Tier 1-3 difficulty levels, with provider info and release dates.
No input parameters required.
curl -X GET 'https://api.parse.bot/scraper/8633ba07-9301-42b3-905a-68d6f1233019/get_scores' \ -H 'X-API-Key: $PARSE_API_KEY'
Get all FrontierMath benchmark scores for both Tier 4 and Tier 1-3, merged into a single table. Each model has scores for both tiers (null if not evaluated for a tier). Sorted by Tier 4 score descending, then Tier 1-3 score.
No input parameters required.
{
"type": "object",
"fields": {
"models": "array of model objects with model_name, model_id, provider, release_date, tier4_score, tier4_score_pct, tier4_error, tier4_error_pct, tier13_score, tier13_score_pct, tier13_error, tier13_error_pct",
"tier4_task": "string - Tier 4 task identifier",
"tier13_task": "string - Tier 1-3 task identifier",
"tier4_count": "integer - number of models with Tier 4 scores",
"tier13_count": "integer - number of models with Tier 1-3 scores",
"total_models": "integer - total unique models across both tiers"
},
"sample": {
"data": {
"models": [
{
"model_id": "gpt-5.5-pro-pre-release_high",
"provider": "OpenAI",
"model_name": "GPT-5.5 Pro (high)",
"tier4_error": 0.071,
"tier4_score": 0.396,
"release_date": "2026-04-23",
"tier13_error": 0.029,
"tier13_score": 0.524,
"tier4_error_pct": 7.1,
"tier4_score_pct": 39.6,
"tier13_error_pct": 2.9,
"tier13_score_pct": 52.4
}
],
"tier4_task": "FrontierMath-Tier-4-2025-07-01-Private",
"tier13_task": "FrontierMath-2025-02-28-Private",
"tier4_count": 62,
"tier13_count": 92,
"total_models": 96
},
"status": "success"
}
}About the epoch.ai API
This API exposes 3 endpoints that return FrontierMath benchmark scores from Epoch AI's leaderboard, covering 90+ AI models from OpenAI, Anthropic, Google DeepMind, Meta AI, DeepSeek, and others. The get_scores endpoint merges Tier 4 and Tier 1-3 results into a single table, while dedicated endpoints for each tier return score percentages, error margins, model release dates, and provider organization for every evaluated model.
Endpoints and Coverage
The API provides three endpoints covering Epoch AI's FrontierMath leaderboard. get_scores returns a unified view of both difficulty tiers merged into one array, where each model object carries tier4_score, tier4_score_pct, tier13_score, tier13_score_pct, and associated error fields — with null values where a model has not been evaluated on a particular tier. The response also includes summary counts: tier4_count, tier13_count, and total_models.
Tier-Specific Endpoints
get_tier4_scores targets the hardest FrontierMath problems (approximately 50 in total) and returns all models that have been evaluated at that level, sorted by score descending. Each model object includes display_name, model_id, release_date, organization, score_pct, and score_error. get_tier13_scores covers the 290-problem Tier 1-3 set and returns the same field shape. Both endpoints expose the task identifier string, which identifies the benchmark task tracked on the leaderboard.
Data Shape and Sorting
All three endpoints sort results by score descending — get_scores sorts by Tier 4 score first, then Tier 1-3 score as a tiebreaker. Model identity is consistent across endpoints via model_id. Provider attribution is available as organization (in the tier-specific endpoints) or provider (in get_scores). None of the endpoints require input parameters; each call returns the full current leaderboard snapshot.
- Track which AI models lead the FrontierMath Tier 4 leaderboard using
score_pctandscore_errorfields. - Compare Tier 4 vs Tier 1-3 performance gaps for a given model using the merged
get_scoresendpoint. - Filter models by
organizationto monitor benchmarks for a specific AI lab such as Anthropic or DeepSeek. - Plot score progression over time using
release_datealongsidescore_pctacross model generations. - Identify models that have only been evaluated on one tier by checking for
nullscores in the merged response. - Build a research dashboard showing frontier math reasoning capability across 90+ models and multiple providers.
- Alert pipelines when new models appear on the leaderboard by comparing
total_modelsacross periodic calls.
| Tier | Price | Credits/month | Rate limit |
|---|---|---|---|
| Free | $0/mo | 100 | 5 req/min |
| Hobby | $30/mo | 1,000 | 20 req/min |
| Developer | $100/mo | 5,000 | 250 req/min |
One credit = one API call regardless of which marketplace API you call. Exceeding the rate limit returns a 429 response. Authenticate with the X-API-Key header.
Does Epoch AI have an official developer API for FrontierMath data?+
What does the `get_scores` endpoint return that the tier-specific endpoints don't?+
get_scores merges both tiers into one array and exposes tier4_score, tier4_score_pct, tier4_error, tier13_score, and tier13_score_pct as parallel fields on each model object. It also returns tier4_count, tier13_count, and total_models as top-level summary integers. The tier-specific endpoints (get_tier4_scores, get_tier13_scores) return a simpler shape focused on one tier's score, score_pct, and score_error, and include a display_name field not present in the merged response.Can I filter results by provider or retrieve scores for a single model by ID?+
organization/provider or selecting by model_id needs to be done client-side on the returned array. You can fork this API on Parse and revise it to add a filtered endpoint that accepts an organization name or model_id as a query parameter.Does the API include historical scores or benchmark results beyond FrontierMath?+
How fresh is the leaderboard data, and are all 90+ models guaranteed to have scores on both tiers?+
get_scores uses null for missing tier scores, and tier4_count vs tier13_count will typically differ. New models appear once Epoch AI adds them to the leaderboard; there is no fixed refresh schedule.