- Affected components: TBD.
- Engineer for initial implementation: TBD.
- Code steward: TBD.
Motivation
Our mobile apps (namely the Android app, for now) will need to synchronize certain data with the user's account on Wikipedia, so that the user's data will persist across different devices, and be accessible from different platforms. This includes various preferences that the user sets within the app, as well as more complex user-created structures like private reading lists, which are currently being developed client-side.
While it is possible to use userjs preferences for storing this information, it becomes impractical for the more complex data (such as reading lists), because all userjs options are transmitted with each pageview for a logged-in user, which would make pageview payloads inefficient for heavy users of these features.
Requirements
(Specify the requirements that a proposal should meet.)
- …
Exploration
Proposal: Authenticated key-value store
Implement a simple, private, per user key-value storage API.
Each user will have their own keyspace, and the keyspace used will always be that for the currently-authenticated user. There will be no access to other users' storage other than by logging into the other user's account (e.g. in a separate session). This avoids one of the major complaints about Gather: since Gather lists were publicly visible, it required policing for violations of policies which the community was not inclined to perform.
The store will provide no "revision history" and no logging: when updating or deleting a value, the old value is erased without possibility of recovery. Logging and/or history are required when a resource may be changed by multiple users or is publicly visible, neither of which are the case here and omitting this reduces the complexity of the implementation significantly.
Operations supported on the store will minimally include get, set, add, and delete. Ideally CAS will be supported for modifications, and ideally batch operations (e.g. multiple gets or sets in one request) will be allowed.
Open questions:
- Should this be implemented as a MediaWiki action API endpoint or a restbase service?
- As a MediaWiki action API endpoint, it would be available in all MediaWiki installations without further effort and could potentially reuse existing code for communicating with storage backends. @Anomie will likely write and maintain it in this case.
- As a restbase service, it might be easier to integrate a backend that isn't already supported by MediaWiki, and the input format wouldn’t necessarily be constrained to being equivalent to HTTP form posts. A developer willing to create and maintain it would need to be found.
- What backend should be used to store the data?
- If we go the action API route: The easy solution would be an SQL table, much like the existing user_properties table. On the other hand, with a little effort we could abstract the backend so that different solutions can be plugged in without rewriting everything; in this case, would it be best to use an existing abstraction such as BagOStuff or create a new one?
- What limits should be placed on the implementation?
- Key length? (for comparison, user_properties limits to 255 bytes)
- Value length? (for comparison, user_properties limits to 65535 bytes)
- Total number of keys or total value size (per user)?
- Should there be one store per wiki, or a global store? Or, in other words, should using the store require a centralized account?
- Should expiration be supported?
- Should enumeration of keys be supported? For example, "return all keys with prefix 'foo'".
- Should non-string values be natively supported in some manner?
- We recommend no. Clients may store non-string values in a serialized format (e.g. json), or they may use one key per value and an additional "index" key if necessary.
- Should "tagging" be natively supported in some manner?
- We recommend no. Clients wanting tagging can easily enough implement it on top of the existing storage by using a key to store the list of keys having a particular tag.
- Does anyone have ideas for preventing misuse (cf. Commons being used for illegal file sharing) besides setting a relatively low limit on total data per user?