Wikimedia Engineering/Report/2014/March
Engineering metrics in March:
- 160 unique committers contributed patchsets of code to MediaWiki.
- The total number of unresolved commits went from around 1450 to about 1315.
- About 25 shell requests were processed.
Major news in March include:
- an overview of webfonts, and the advantages and challenges of using them on Wikimedia sites;
- a series of essays written by Google Code-in students who shared their impressions, frustrations and surprises as they discovered the Wikimedia and MediaWiki technical community;
- Hovercards now available as a Beta feature on all Wikimedia wikis, allowing readers to see a short summary of an article just by hovering a link;
- a subtle typography change across Wikimedia sites for better readability, consistency and accessibility;
- a recap of the upgrade and migration of our bug tracking software.
Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Personnel
editAre you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- VP of Engineering
- Software Engineer - Growth
- Software Engineer - VisualEditor (Features)
- Software Engineer - Fundraising Team
- Software Engineer - Internationalization
- Software Engineer- Mobile (Frontend)
- Automation Engineer (Ruby)
- Release Engineer
- Research Analyst - Fundraising
- Director of Community Engagement (Product)
- Product Manager - Language Engineering
- Operations Security Engineer
Announcements
edit- Chase Pettet joined the Wikimedia Operations Team as as Operations Engineer (announcement).
- Following changes in the Engineering Community Team, Quim Gil took over as Engineering Community Manager, and Sumana Harihareswara transitioned to the role of Senior Technical Writer (announcement).
- Kevin LeDuc joined the Wikimedia Foundation as Analytics Product Manager (announcement).
Technical Operations
edit- Final negotiations and coordination are still ongoing for the data center RFP, but we expect to be able to make an announcement soon.
Wikimedia Labs
Labs metrics in March:
- Number of projects: 149
- Number of instances: 310
- Amount of RAM in use (in MBs): 1,288,704
- Amount of allocated storage (in GBs): 14,925
- Number of virtual CPUs in use: 635
- Number of users: 2,907
- The Labs Ops team has spent the month shepherding projects from the Tampa cloud to the Ashburn cloud. Dozens of volunteers contributed to the move, and all tools and projects have now been copied to or rebuilt in Ashburn. Some projects and tools are in a non-running state pending action on the part of their owners or admins. Ashburn Labs is running OpenStack Havana, with NFS for shared storage.
- The usage stats this month are quite a bit different from last month. Quite a number of obsolete instances have been purged, and last month's stats may have included some data center duplication.
Tampa data center
- During March, the Ops team has been decommissioning and shutting down a lot of hosts in the old Tampa data center, including all former appservers. The amount of energy consumed in the old data center has been greatly reduced. A few hosts are going to be migrated to another floor in the existing data center and physical data center work is coming up.
Editor retention: Editing tools
editMarch saw the Parsoid team continuing with a lot of unglamorous bug fixing and tweaking. Media / image handling in particular received a good amount of love, and is now in a much better state than it used to be. In the process, we discovered a lot of edge cases and inconsistent behavior in the PHP parser, and fixed some of those issues there as well.
We wrapped up our mentorship for Be Birchall and Maria Pecana in the Outreach Program for Women. We revamped our round-trip test server interface and fixed some diffing issues in the round-trip test system. Maria wrote a generic logging backend that lets us dynamically map an event stream to any number of logging sinks. A huge step up from our console.error based basic error logging so far.
We also designed and implemented a HTML templating library which combines the correctness and security support of a DOM-based solution with the performance of string-based templating. This is implemented as a compiler from KnockoutJS-compatible HTML syntax to a JSON intermediate representation, and a small and very fast runtime for the JSON representation. The runtime is now also being ported to PHP in order to gauge the performance there as well. It will also be a test bed for further forays into HTML templating for translation messages and eventually wiki content.Core Features
editGrowth
editSupport
editYuri continued analytics work on SMS/USSD pilot data. Post hoc analysis was performed on WML usage after its deprecation; it is still low, although obtaining more low-end phones to check for how well HTML renders and how to enhance the HTML could be useful. Post hoc analysis was also performed on anomalous declines and growth spurts in log lines (not strictly related to pageviews); in the former it much had to do with API changes and in the latter it had much to do with an external polling mechanisms.
With the assistance of the Apps team, User-Agent, Send App Feedback, and Random features were added to the forthcoming reboots of the Android and iOS apps, while making the Share feature for Android allow for a different _target app each time and providing code review assistance on the Android and iOS apps code; proof of concept for fulltext search was started on iOS. Wikipedia for Firefox OS bugfixes were also pushed to production. Screencap workflows and preload information was put together for the Android reboot with respect to Wikipedia Zero as well.
The team worked with Ops on forward planning in light of the extremely infrastructure-oriented nature of the program. Quarterly review as held with the ED, VP of Engineering, and the W0 cross-functional team, and the W0 cross-functional team reviewed presentation material for publication. The team also continued work on additional proxy and gateway support. To help partner tech contacts, the team worked on reformatting the tech partner introductory documentation.
Finally, the team explored proactive MCC/MNC-to-IP address drift correction, and will be emailing the community for input soon.Wikipedia Zero (partnerships)
- Smart, the largest mobile operator in the Philippines, is giving access to Wikipedia free of data charges through the end of April. They announced the promotion in a press release. Ingrid Flores, Wikipedia Zero Partner Manager, visited the Philippines and arranged a meeting with local community members and Smart. They are now exploring ways to collaborate in support of education. The partnerships team kicked off account reviews with the 27 existing Wikipedia Zero partners, to update the implementation, identify opportunities for collaboration in corporate social responsibility (CSR) initiatives and get feedback on the program. The account reviews will continue for the next few months. Last, we continued recruiting for Wikipedia Partner Manager for the Asia region.
Language engineering communications and outreach
Santhosh Thottingal and David Chan continued development and technology research on the Content Translation project. Development was focused specifically on updates to the side-by-side translation editor and section alignment of translated text. Kartik Mistry and Santhosh Thottingal worked on infrastructure for testing the Content Translation server. David Chan continued his technology research on sentence segmentation.
Pau Giner updated the Content Translation UI design specification incorporating review comments from UX and product reviews. The team also participated in a review of the Content Translation project with the product team leadership.MediaWiki Core
editSecurity auditing and response
Quality assurance
editQuality Assurance/Browser testing
Multimedia
editIn March, the multimedia team’s main project was Media Viewer v0.2, as we completed final features for the tool's upcoming release next quarter. Gilles Dubuc, Mark Holmquist, Gergő Tisza and Aaron Arcos developed a number of new features, including: share, embed, download, opt-out preference,file page link and feedback link, based on designs by Pau Giner. We invite you to test the latest version (see the testing tips) and share your feedback.
Fabrice Florin coached the multimedia team as product manager and hosted several planning and review meetings, including a cycle planning meeting (leading to the next cycle plan) and the Multimedia Quarterly Review Meeting for the first quarter of 2014, which summarizes our progress and next steps for coming work (see slides). He also worked with Keegan Peterzell to engage community members for the gradual release of Media Viewer, to be enabled by default on a number of pilot sites next month, then deployed widely to all wikis a few weeks later. For more updates about our multimedia work, we invite you to join the multimedia mailing list.Project management tools/Review
- Niharika's compact language links is now a Beta feature
- Anu's upload wizard with OSM support
- Diwanshi's Wikipedia API courses in Codecademy 1 and 2
- Brena's prototype of mediawiki.org's redesigned homepage
- Be's clean up of Parsoid's round-trip testing UI is merged
- Maria's clean up of Parsoid's tracing/debugging/logging is merged
Volunteer coordination and outreach
- Who contributes code
- Gerrit review queue
- Code contributors new and gone
- Bugzilla response time
- Top contributors
The first Analytics use case for this system will be Camus, Linked-In's open source application for loading Kafka data into Hadoop. Once this is productized, we'll have the ability to regularly load log data from our servers into Hadoop for processing and analysis.
We did some significant architectural work on WikiMetrics this month to prepare it for its role as our recurrent report scheduling and generation system. The first use case for this system will be the Editor Engagement Vital Signs project, which will provide daily updates on key metrics around participation.
Analytics/Logging infrastructure
This month we concluded the first stage of work on metrics standardization. We created an overview of the project with a timeline and a list of milestones and deliverables. We also gave an update on metrics standardization during the March session of the Research and Data monthly showcase. The showcase also hosted a presentation by Aaron Halfaker on his research on the impact of quality control mechanisms on the growth of Wikipedia.
We published an extensive report from a session we hosted at CSCW '14 on Wikipedia research, discussing with academic researchers and students how to work with researchers at the Foundation.
We submitted 8 session proposals for Wikimania '14, authored or co-authored by members of the research team.
We attended the Analytics team's Q3 quarterly review during which we presented the work performed by the team in the past quarter and our goals for the upcoming quarter (April-June 2014).
We completed the handover of Fundraising analytics tools and knowledge transfer in preparation for a new full-time research position that we will be opening shortly to support the Fundraising team.
We continued to provide support to teams in focus area (Growth and Mobile) with an analysis of the impact of the rollout of the new onboarding workflows across multiple wikis; an analysis of mobile browsing sessions and ongoing analysis of mobile user acquisition tests. We also supported the Ops team in measuring the impact of the deployment of the ULSFO cluster, which provides caching for West USA and East Asia.The Kiwix project is funded and executed by Wikimedia CH.
- This month, we released a new version of Kiwix for Android that adds support for older versions of Android like Gingerbread; about 50% more devices than before are now supported.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- The team worked on making ranks more useful. From now on, by default the property parser function and Lua always return the values with the "preferred" rank or, when none is available, the one with the "normal" rank. This allows for example to exclude past mayors when asking Wikidata for the mayor of a city. Additionally, considerable speed improvements have been made; browsing Wikidata is now a lot faster. Diffs between versions of pages on Wikidata have also been improved to make it easier to see what changes were made to an item. Last but not least, the user interface redesign research went on.
Future
edit- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.