Problem
MessageCache::load uses the WANObjectCache (Memcached for WMF) via MessageCache::saveToCaches() to save several different cache keys (the hash, the check key, and key:message blob) on a per-language bases. There is $wgMaxMsgCacheEntrySize to prevent the value from becoming too large, but that only focuses on large messages within the blob, not on the total blob size. Even with gzip, we have blobs approaching 800KB (metawiki,en). Although most hits should come from APC anyway, if the blob ends up too big (larger than the 1MB memc limit, thus un-settable), then edits to MW: pages would cause various problems.
Disaster scenario
As of writing, this problem mainly affects Meta-Wiki. The other two consumers of MediaWiki namespace pages for content that is not "interface message overrides" (site scripts, and gadgets), are important to think about, but are minor in comparison to the years of building up CentralNotice banners, variations, and translations.
The below is what would happen if the 1MB size were to be exceeded.
- All servers will block on a global lock to de-duplicate regeneration effort for a value only storeable locally in APC. They will log 'global cache is presumed expired' around purge time and 'global cache is empty afterwards. Blocking on getReentrantScopedLock() will be a waste unless that thread was from the server in question itself. If not, there will be another iteration in the loop, which will either block again or reach loadFromDB(). In the former case, $failedAttempts is spent and the $staleValue (from APC) is used.
- If there is no APC value at all (not just expired), then the would some slow requests doing regeneration as well as many more request failing to load anything for the MessageCache instance, logged as 'waited for other thread to complete'. This is due to the stampede protection from the non-blocking getReentrantScopedLock() call (combined with global key failure).
Solutions
- Automatic shrink: It might be useful to check the whole size of the message name/text map and if it's too big, then some items (largest first) would use the individual key logic.
- Limit to localisation overrides: It also is worth considering whether a message key appearing in the title of a MW:page is defined in i18n code and overridden, or, if the message is arbitrary, or if it the name is dynamic and used by some extension (e.g. messages with a magic prefix). Those could perhaps be cached differently, always or at least when combination blob can't fit everything.
- Reduce time to rebuild blob: It would also be nice of unchanged keys could avoid all of the ExternalStore fetch logic by having page_latest in the cache (or integer version of page_touched).