live

Epoch APIepoch.ai ↗

Retrieve FrontierMath benchmark scores for 100+ AI models across Tier 4 and Tier 1-3 difficulty levels, including mean scores, standard errors, and provider info.

Developer Tools education other

Endpoint health

verified 4d ago

get_tier4_scores

get_tier13_scores

get_scores

3/3 passing latest checkself-healing

Endpoints

Updated

26d ago

What is the Epoch API?

This API exposes FrontierMath benchmark scores from Epoch AI's leaderboard across 3 endpoints, covering 100+ AI models evaluated on mathematical reasoning tasks at two difficulty levels. The get_scores endpoint returns a unified view merging Tier 4 (the hardest ~50 problems) and Tier 1-3 (290 problems) into a single sorted response, while dedicated endpoints isolate each tier. Every model record includes mean score, standard error, best score, release date, and organization.

Try it

No input parameters required.

→ api.parse.bot/scraper/8633ba07-9301-42b3-905a-68d6f1233019/<endpoint>

Ready to send

Fill in the parameters and hit sign in to send to see live response data here.

Call it over HTTPgrab a free API key at signup

curl -X GET 'https://api.parse.bot/scraper/8633ba07-9301-42b3-905a-68d6f1233019/get_scores' \
  -H 'X-API-Key: $PARSE_API_KEY'

Python SDK · recommended

Typed, relational, agent-ready

A generated client with real types, enums, and the links between objects — the structure a flat JSON response can't carry. Autocompletes in your editor and reads cleanly to coding agents.

Fully typed · autocompletes
Objects link to objects
Typed errors & pagination

Typed Python client. Set up the SDK in your uv project, then pull this API’s typed client:

uv add parse-sdk
uv run parse init
uv run parse add --marketplace epoch-ai-api

uv run parse add --marketplace pulls a pinned snapshot of this canonical API — it won’t change underneath you. To customize it, subscribe and swap to your own copy.

"""Walkthrough: FrontierMath SDK — browse AI model benchmark scores."""
from parse_apis.epoch_ai_frontiermath_benchmark_api import FrontierMath, TierScore, Model, UpstreamError

client = FrontierMath()

# List merged leaderboard (both tiers), capped at 5 models.
for model in client.models.list(limit=5):
    t4 = f"Tier4: {model.tier4_score_pct}%" if model.tier4_score_pct else "Tier4: n/a"
    t13 = f"Tier1-3: {model.tier13_score_pct}%" if model.tier13_score_pct else "Tier1-3: n/a"
    print(f"{model.model_name} ({model.provider}) — {t4}, {t13}")

# Drill into Tier 4 hardest-problem scores.
top_tier4 = client.tierscores.tier4(limit=1).first()
if top_tier4:
    print(f"Top Tier 4: {top_tier4.display_name} — {top_tier4.score_pct}% (org: {top_tier4.organization})")

# Iterate Tier 1-3 scores, print top 3.
for score in client.tierscores.tier13(limit=3):
    print(f"{score.display_name}: {score.score_pct}% (error ±{score.score_error_pct}%)")

# Typed error handling around a call.
try:
    for m in client.models.list(limit=2):
        print(m.model_id, m.release_date)
except UpstreamError as exc:
    print(f"Upstream issue: {exc}")

print("exercised: models.list / tierscores.tier4 / tierscores.tier13")

All endpoints · 3 totalmissing one? ·

Fetches all FrontierMath benchmark scores for both Tier 4 and Tier 1-3, merged into a single response. Each model carries scores for both tiers (null when not evaluated for a tier). Models are sorted by Tier 4 score descending, then Tier 1-3 score. A single CSV fetch powers the response — no pagination, no parameters.

Input

No input parameters required.

Response

{
  "type": "object",
  "fields": {
    "models": "array of model objects with tier4 and tier13 scores merged",
    "tier4_task": "string - Tier 4 task identifier used upstream",
    "tier13_task": "string - Tier 1-3 task identifier used upstream",
    "tier4_count": "integer - number of models evaluated on Tier 4",
    "tier13_count": "integer - number of models evaluated on Tier 1-3",
    "total_models": "integer - total unique models across both tiers"
  },
  "sample": {
    "data": {
      "models": [
        {
          "model_id": "gdm-ai-co-mathematician",
          "provider": "Google DeepMind",
          "model_name": "AI co-mathematician",
          "tier4_error": 0.072,
          "tier4_score": 0.479,
          "release_date": "2026-05-08",
          "tier13_error": null,
          "tier13_score": null,
          "tier4_error_pct": 7.2,
          "tier4_score_pct": 47.9,
          "tier13_error_pct": null,
          "tier13_score_pct": null
        }
      ],
      "tier4_task": "FrontierMath-Tier-4-2025-07-01-Private",
      "tier13_task": "FrontierMath-2025-02-28-Private",
      "tier4_count": 71,
      "tier13_count": 100,
      "total_models": 105
    },
    "status": "success"
  }
}

About the Epoch API

Endpoints and What They Return

The get_scores endpoint is the primary entry point: it returns a merged array of all model objects sorted by Tier 4 score descending, then by Tier 1-3 score. Each object carries both tier4 and tier13 score blocks, with null values where a model was not evaluated on a given tier. The response also includes tier4_count, tier13_count, and total_models integers so you can quickly understand coverage without iterating the array.

get_tier4_scores isolates Tier 4 results — the hardest problems on the benchmark, approximately 50 in total. get_tier13_scores does the same for the 290-problem Tier 1-3 set. Both endpoints return the same per-model shape: display_name, organization, release_date, mean_score, standard_error, and best_score, along with a task string identifying the benchmark variant and a total integer.

Coverage and Data Shape

The leaderboard includes models from major AI labs including OpenAI, Anthropic, Google DeepMind, and others. Release dates let you correlate score improvements against model generations. Standard error values are included alongside mean scores, which matters when comparing models with similar performance. Models that appear in both tiers are represented in a single object in get_scores rather than duplicated rows.

Source Context

FrontierMath is a benchmark designed to test mathematical reasoning at research level. Epoch AI publishes and maintains the leaderboard at epoch.ai. The tiered structure separates problems by difficulty: Tier 1-3 covers standard to moderate difficulty across 290 problems, while Tier 4 targets the hardest subset. This API reflects the leaderboard as Epoch AI publishes it.

Reliability & maintenanceVerified 4d ago

The Epoch API is a managed, monitored endpoint for epoch.ai — not a raw scraper you maintain. Every endpoint is automatically health-checked on a schedule, and when epoch.ai changes and a check fails, the API is automatically queued for repair and re-verified. It is built to keep working as the site underneath it changes.

This isn't an official epoch.ai API — it's an independent, maintained REST wrapper over public data. Where the source has no official API (or only a limited one), Parse gives you a stable contract over a source that never promised one, and keeps it current. Need a new endpoint or field? You can revise it yourself in plain English and the agent rebuilds it against the live site in minutes — contributing the change back to the shared API is free.

Last verified

4d ago

Latest check

3/3 endpoints passing

Maintenance

Monitored & self-healing

Will this API break when the source site changes?+

It's built not to. Every endpoint is health-checked on a schedule with automated test probes. When the source site changes and a check fails, the API is automatically queued for repair and re-verified — that's the self-healing layer. Each API page shows when its endpoints were last verified. And because marketplace APIs are shared, any fix reaches everyone using it.

Is this an official API from the source site?+

No — Parse APIs are independent, managed REST wrappers over publicly available data. That is the point: where a site has no official API (or only a limited one), Parse gives you a maintained, monitored endpoint for that data and keeps it working as the site changes — so you get a stable contract over a source that never promised one.

Can I fix or extend this API myself if I need a new endpoint or field?+

Yes — and you don't have to wait on us. This API was generated by the Parse agent, which stays attached. Describe the change in plain English ("add an endpoint that returns reviews", "fix the price field") in the revise box on the API page or via the revise_api MCP tool, and the agent rebuilds it against the live site in minutes. Contributing the change back to the public API is free.

What happens if I call an endpoint that has an issue?+

Errors are machine-readable: a bad call returns a clean status with the list of available endpoints and a repair hint, so an agent (or you) can recover or trigger a fix instead of failing silently. Confirmed failures feed the automatic repair queue.

Common use cases

Track which AI model currently leads on Tier 4 mathematical reasoning problems using the sorted get_tier4_scores response
Compare mean scores and standard errors across models from different providers to assess statistical significance of performance gaps
Plot score progression over time by combining release_date fields with mean scores from get_scores
Filter models by organization to benchmark a single lab's lineup across both difficulty tiers
Build a dashboard that surfaces the top-N models on each tier using tier4_count and tier13_count from get_scores
Identify models evaluated on Tier 4 but not Tier 1-3 (or vice versa) using null tier scores in the merged response
Research how model release cadence from major AI labs correlates with FrontierMath score improvements

Pricing & limitsSee full pricing →

Tier	Price	Credits/month	Rate limit
Free	$0/mo	100	5 req/min
Hobby	$30/mo	1,000	20 req/min
Developer	$100/mo	5,000	100 req/min

One credit = one API call regardless of which marketplace API you call. Exceeding the rate limit returns a 429 response. Authenticate with the X-API-Key header.

Frequently asked questions

Does Epoch AI offer an official developer API for FrontierMath data?+

Epoch AI does not publish a public developer API for the FrontierMath leaderboard. Their site at epoch.ai presents the leaderboard as a web page rather than a documented REST or GraphQL service.

What is the difference between `get_scores` and the two tier-specific endpoints?+

get_scores returns a single merged array covering all unique models from both tiers, with each model carrying score fields for both Tier 4 and Tier 1-3 (null where not evaluated). get_tier4_scores and get_tier13_scores each return only the models evaluated on that specific tier, with no null score fields, plus a task string identifying the benchmark variant.

Does the API include individual problem-level results or only aggregate scores?+

The API returns aggregate scores per model: mean score, standard error, and best score. Individual problem-level results are not currently exposed. The API covers model-level summary statistics as published on the FrontierMath leaderboard. You can fork this API on Parse and revise it to add an endpoint for problem-level breakdowns if Epoch AI surfaces that data.

Can I filter results by organization or model release date?+

The endpoints do not accept filter parameters — all three return the full model set without server-side filtering. The organization and release_date fields are present on every model object, so client-side filtering on those fields is straightforward. You can fork this API on Parse and revise it to add query parameters that pre-filter by organization or date range.

How current is the benchmark data?+

The data reflects Epoch AI's FrontierMath leaderboard as it is currently published. Epoch AI updates the leaderboard when new model evaluations are submitted and validated; the API returns whatever state the leaderboard is in at the time of the request. There is no historical snapshot or changelog endpoint — only the current published standings.

Page content last updated June 11, 2026. Spec covers 3 endpoints from epoch.ai.

Related APIs in Developer ToolsSee all →

artificialanalysis.ai API

Compare and rank LLM models and providers across performance benchmarks, then dive into detailed specifications for any model to find the best fit for your needs. Discover performance metrics for specialized AI systems handling speech, images, and video, plus benchmark data for different hardware configurations.

lmarena.ai API

developers.openai.com API

Check current pricing for all OpenAI models including GPT, image generation, audio, video, embeddings, and fine-tuning across different pricing tiers like Batch, Flex, Standard, and Priority. Get real-time cost information to compare rates and plan your API spending.

modelscope.cn API

Browse and retrieve top-performing AI models and explore curated datasets from ModelScope.cn, China's premier AI model community. Discover the latest models ranked by popularity and access comprehensive dataset collections for your machine learning projects.

hackerrank.com API

Retrieve challenge scores, difficulty ratings, success ratios, and track-level ranking data from HackerRank's public practice platform. Browse challenges by track, view submission statistics, and access ranking metrics across all available tracks.

ollama.com API