Wikipedia APIWikipedia.org ↗
Search Wikipedia articles, retrieve full text and metadata, and browse category trees via 3 structured endpoints. JSON responses with pageid, extract, and categories.
curl -X GET 'https://api.parse.bot/scraper/cabed3f6-f0a2-4ce6-ab14-897da89a04db/search_articles?limit=5&query=Python+%28programming+language%29' \ -H 'X-API-Key: $PARSE_API_KEY'
Search Wikipedia articles by keyword query. Returns matching articles with titles, snippets, word counts, and timestamps. Supports pagination via offset.
| Param | Type | Description |
|---|---|---|
| limit | integer | Number of results to return (1-50). |
| query | string | Search query string. |
| offset | integer | Pagination offset for fetching next page of results. |
{
"type": "object",
"fields": {
"query": "search query string that was used",
"offset": "current pagination offset",
"articles": "array of article summaries with pageid, title, snippet, size, wordcount, and timestamp",
"total_hits": "total number of matching articles",
"next_offset": "offset for the next page of results, or null if no more pages"
},
"sample": {
"data": {
"query": "artificial intelligence",
"offset": 0,
"articles": [
{
"size": 269229,
"title": "Artificial intelligence",
"pageid": 1164,
"snippet": "Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning",
"timestamp": "2026-05-02T09:37:23Z",
"wordcount": 26715
}
],
"total_hits": 27451,
"next_offset": 5
},
"status": "success"
}
}About the Wikipedia API
The Wikipedia API provides 3 endpoints that cover article search, detailed content retrieval, and category browsing across the entirety of English Wikipedia. search_articles returns ranked results with snippets, word counts, and timestamps. get_article_details returns the full article extract, categories, revision ID, and byte length by either title or page ID. get_category_members lets you walk the category tree with typed pagination.
Endpoints and What They Return
The search_articles endpoint accepts a query string and returns an articles array where each item includes pageid, title, snippet, size, wordcount, and timestamp. The total_hits field tells you how many results exist across all pages, and next_offset provides a direct value to pass into the offset parameter for the next page. Results are limited to a maximum of 50 per call.
Article Details
get_article_details requires either a title (e.g., 'Artificial intelligence') or a pageid (e.g., '21721040'). By default it returns the introductory extract only; set full_extract to 'true' to retrieve the complete article text. The response includes url, length in bytes, language, last_revision_id, content_model, and a categories array listing every category the article belongs to.
Browsing Category Trees
get_category_members accepts a category name without the Category: prefix and returns up to 50 members per call. The type parameter controls what is returned: 'page' for articles, 'subcat' for subcategories, or 'file' for media files. Each member object contains pageid, title, and namespace. Pagination is token-based: the next_continue field from one response is passed as the continue parameter in the next call, and is null when no further results exist.
- Build a knowledge base ingestion pipeline using
full_extractto pull complete article text by title. - Populate autocomplete or search suggestions using
search_articleswithsnippetandtitlefields. - Map topic hierarchies by recursively calling
get_category_memberswithtype: 'subcat'. - Cross-reference article freshness using the
timestampfrom search results andlast_revision_idfrom article details. - Resolve ambiguous entity names to canonical Wikipedia
pageidvalues for use in downstream data pipelines. - Enumerate all articles within a subject area (e.g., 'Machine learning') by paginating through
get_category_memberswithtype: 'page'. - Collect structured metadata — word count, byte length, language, content model — for corpus analysis.
| Tier | Price | Credits/month | Rate limit |
|---|---|---|---|
| Free | $0/mo | 100 | 5 req/min |
| Hobby | $30/mo | 1,000 | 20 req/min |
| Developer | $100/mo | 5,000 | 250 req/min |
One credit = one API call regardless of which marketplace API you call. Exceeding the rate limit returns a 429 response. Authenticate with the X-API-Key header.
Does Wikipedia have an official developer API?+
What does `get_article_details` return by default versus with `full_extract: 'true'`?+
extract field contains only the introductory section of the article. Setting full_extract to 'true' replaces that with the complete article text. All other response fields — url, title, length, pageid, language, categories, content_model, and last_revision_id — are returned regardless of that parameter.Does the API cover languages other than English?+
language field is returned in get_article_details responses, but the current endpoints are scoped to English Wikipedia. Non-English Wikipedia editions are not covered. You can fork this API on Parse and revise it to target a different language edition.Can I retrieve article revision history or diff data?+
last_revision_id for the current article state; full revision history and diff data are not available through these endpoints. You can fork this API on Parse and revise it to add a revision-history endpoint.How does pagination work across the three endpoints, and are the mechanisms consistent?+
search_articles uses an integer offset that you read from next_offset in each response. get_category_members uses an opaque string token: read next_continue from one response and pass it as the continue parameter in the next. Both return a null sentinel when no further pages exist. get_article_details is a single-record lookup and has no pagination.