MediaWiki utilities is a collection of simple, sharp tools for extracting and processing MediaWiki data. These libraries are inspired by the Unix philosophy. Each library is designed to *do one thing and do it well*. The libraries are designed to *work together*. Where applicable, they also include unix-style command line utilities that *handle text streams, because that is a universal interface*.
In this session, I'll introduce participants to what utilities are already available. Specifically, I'll demo the extremely easy to use and high power XML parser use of Wikimedia's massive XML dumps. Then we'll talk about new work on current utilities and the development of new utilities.
New utilities proposals:
- mwrefs -- Handle <ref> extraction, bibliography extraction and metadata fetching for academic identifiers.
- mwmetrics -- Standardized library for deploying quality and behavioral metric strategies
- mwviews -- Parsing old view logs, accessing new pageview APIs, etc.
- mwdiscussions -- Parsing utilities for analyzing discussion pages
- Etherpad: https://etherpad.wikimedia.org/p/WikiDev16-T114247 **