Requests for comment/Content API

https://github.com/gwicke/restbase

Request for comment (RFC)
Content API
Component Services
Creation date
Author(s) Gabriel Wicke
Document status implemented

Problem statement

edit

With the growing popularity of mobile apps, JavaScript in the browser and moves towards fragment caching (ESI) MediaWiki's content is increasingly accessed through web APIs. The existing MediaWiki API is not optimized for high-volume content access. Per-request overhead is relatively high (20-30ms) and caching and URL rewriting is difficult as the URL schema is not deterministic and many end points are POST-based.

The storage service RFC proposes a REST-style content interface for internal use. A part of this internal interface can also be used as a public content API. To make this work well, issues important for an external content API need to be considered in the design of the storage service.

Goals

edit
  • Support high request volumes -- provide an efficient API to retrieve content from the mobile apps, ESI, bots etc..
  • Caching support -- no random query parameter URLs that cannot be purged.
  • Support rewriting -- use URL patterns that support URL-based rewriting in something like Varnish.
  • API versioning -- enable evolution of APIs without breaking users unnecessarily
  • Consistency -- use essentially the same URL scheme externally and internally. Return the same content internally and externally, and make links in content work in both contexts without extensive rewriting.

Resource / URI layout considerations

edit

The design of a URI layout involves a lot of trade-offs, which are discussed in more detail in these notes. Your feedback on this is more than welcome. This is a summary of the current thinking:

API entry point

edit

With user accounts, discussions & media now working across wikis it seems to make sense to use a central domain for the API in the WMF setup, e.g. api.wikimedia.org. In smaller setups, the /v1/ path prefix can be mapped to the API.

Page sub-resources

edit

Page-related information like revisions or metadata are most naturally represented as sub-resources. The main issue here is that page names can contain slashes. Another issue is that URIs should be deterministic so that they can be cached.

We have decided to encode slashes in path components, so that we can use natural REST paths throughout:

/v1/enwiki/pages/Foo%2FBar/rev/latest/html
Slashes in page title percent-encoded
Regular REST path for sub-resources
Disadvantage: inconsistency with normal read URIs

See the notes for other options considered.

edit

We would like to use relative links in stored content wherever possible. Links should continue to work even if the API is moved to a different path prefix. Page names containing slashes complicate this a bit, as normal browser behavior is to interpret relative links relative to the page name.

The current solution used by Parsoid is to prefix relative links in a page called Foo/Bar/Baz with ../../. Sadly this does not work so well when content fragments from several pages are combined in one output page, for example in Flow timelines. All links in the content would need to be rewritten so that they work with a different page name. Similar issues occur when pages are renamed.

A promising alternative for HTML content is to make all links relative to the wiki root (href="Foo"), and make this work even for pages containing slashes by setting <base href="/wiki/"> in the skin. This also avoids issues with alternate path-less entry points like index.php?title=foo&.... Setting base href is much cheaper than rewriting all hrefs in content, and allows the combination of content fragments even where that is not easily possible (ESI).

This solution won't work for JSON responses though. More research is needed there.

Content API handled by storage service backend

edit

An example request to a public key-value bucket as mentioned in the storage service RFC:

GET /v1/enwiki/pages/Main_Page/rev/latest/html
GET /v1/enwiki/math-png/96d719730559f4399cf1ddc2ba973bbd.png

See the storage service RFC for more example URLs following the same pattern.

Useful resources

edit

Structured API specs

edit
  • machine-readable
  • provide rich & auto-generated documentation / sandboxes and mocks
  • ensure that docs reflect the deployed software version & configuration by integrating them with the routing mechanism
  • Overview articles 1, 2
  • Swagger
    • popular & fairly straightforward to use per API end point
    • out of band
  • JSON schema hypermedia: Google API discovery service, Heroku
    • very powerful, but less convenient to use; perhaps a good output format for Swagger
    • supports URL discovery out of band in schemas: good for efficiency
    • Recommends linking the schema using a mime type with a profile link: application/json; profile=http://my.com/myschema.json# [1]
  • JSON Hypertext Application Language RFC: Standard for linking in JSON responses (in-band)
edit

See also

edit
  NODES
design 3
eth 1
see 5
Users 1