Wikipedia:Wikipedia Signpost/Single/2009-06-22
Study of vandalism survival times
- Loren Cobb (User:Aetheling) holds a Ph.D. in mathematical sociology and is a research professor in the Department of Mathematical and Statistical Sciences at the University of Colorado Denver.
This study has a narrow focus: to determine the distribution of the length of time that vandalism remains on the English-language Wikipedia. This distribution is also known as the survival function for vandalism. The two primary results from this study are: (a) the median time to correction is down to four minutes, and (b) some subtle forms of vandalism still persist for months and even years.
In the past there have been other statistical studies, both formal and informal, of how long vandalism remains in Wikipedia until it is corrected, but almost all of them express their results as a mean time to correction (i.e., as a simple arithmetic average of the observed times). I will show in this study that the distribution function for time to correction has such a fat tail that the mean time to correction is both mathematically and substantively meaningless. The median time to correction, on the other hand, conveys useful information.
Methods
A random sample of 100 articles from the English language edition of Wikipedia was obtained through the use of the random article link in navigation toolbar. For each article, the history log was used to examine each recorded change, starting from the most recent, going back until a clear instance of vandalism was found. Then the changes were scanned in reverse order, going forward until the vandalism was corrected.
For each such instance of vandalism, the elapsed time until correction was computed, in minutes. These are the fundamental data on which this report is based.
In addition, some notes were taken on the general nature of the vandalism. All data collection occurred on 2009-06-11.
Results
- Of the 100 articles, fully 75 had never been vandalized.
- Of the 25 articles that were vandalized at least once, the most recent such instance of vandalism was eventually corrected in 23 articles.
- In five (20%) of the vandalized articles, the most recent instance of vandalism was corrected in less than one minute. A further four instances were corrected in less than two minutes.
- The median time to correction was four minutes.
- Two articles were found to have suffered vandalism that was never corrected. One of these was a subtle act of vandalism that was committed on 2007-02-23, and still not detected by the date of the study, 2009-06-11.
Discussion
A histogram of times to correction is shown in the chart to the right. Note that the horizontal axis is depicted on a logarithmic scale, to accomodate its enormously long right-hand tail.
In this histogram there are evidently two separate processes at work. The bulk of the histogram follows a curve that declines as a power function of elapsed time: this is the process by which ordinary readers and editors of Wikipedia stumble across and correct instances of vandalism.
The first two bars on the left, however, are significantly higher than the curve would suggest. The difference between the actual height of the bars and the height predicted by the curve is accounted for by the independent activity of Wikipedia's Recent Change Patrol (RCP). Members of the RCP typically monitor the Recent Change Log for suspicious edits. The RCP is able to correct most blatant vandalism within seconds of occurrence.
Both of these vandalism-correction processes act in concert to produce a remarkable result: the median time to correction for vandalism in this study was found to be just four minutes. Similar (unpublished) studies performed by this author one and two years ago yielded median times to correction of five and six minutes, respectively. It seems apparent that Wikipedia is improving its already impressive rate of vandalism detection and correction.
Problems with Mean Time to Correction
The fact that the estimated curve for the survival function is exponential on a graph whose horizontal axis is logarithmic indicates that the probability density function itself follows a power law distribution, also known as a Pareto distribution, given by the formula
If the parameter in the above formula is less than one — as it is in this case — then the mean of the distribution is infinite. The practical significance of this unusual situation is that any sample mean calculated from empirical data conveys absolutely no information whatsoever about the typical length of time that it takes for an instance of vandalism to be corrected.
The only useful alternative to a sample mean in this situation is the sample median, which is fully robust with respect to long-tailed distributions.
Depending upon what assumptions are made concerning the rate of activity of the RCP, the parameter for the Pareto distribution lies in a range between about 0.25 and 0.40. This range is comfortably below one, indicating that the tail of the distribution is huge and that sample means are completely and utterly useless for describing the data.
Observations on types of vandalism
About 84% of the vandalism that I observed in this random sample seemed to be just adolescent fooling around. Of the 16% that appeared more adult, half seemed to be adult humor or anger, and half seemed to come from people whose intent was to leave a permanent but nearly invisible mark upon Wikipedia. For example, the perpetrator will carefully change the spelling of an obscure name to an incorrect form, or change a location to something that still looks plausible at first glance. I imagine them coming back over and over again to the page that they altered, to see if that subtle little change is still there. Perhaps this impulse is roughly the same as the one which causes people to carve their initials into trees, or to scratch them on rocks.
Conclusions
The fact that 50% of all vandalism is being detected and reverted within an estimated four minutes of appearance should go a long way to allay fears about the susceptibility of English-language Wikipedia articles to malicious vandalism. On the other hand, the fact that an estimated 10% of all vandalism endures for months and even years indicates that some new tools and strategies are needed for rooting out the most subtle and persistent forms of vandalism.
Raw data
The elapsed times (in minutes) to correction for the instances of vandalism found in this study were as follows: { 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 4, 5, 8, 9, 19, 73, 213, 490, 672, 2442, 14176, 152996 }. In addition, two cases of vandalism had never been corrected (until discovered by the author).
Reader comments
Wikizine, video editing, milestones
Wikizine
A large new edition of Wikizine is out: "Year: 2009 Week: 29 Number: 108". It includes news about the LiquidThreads extension, various Wikimedia Foundation announcements and goings-on, privacy issues with traffic analysis services that were installed on two Wikipedias, a Wikimedia Canada meeting and Wiki-Conference New York, and more.
Video editing coming soon?
According to an article in MIT's Technology Review, "Wikipedia Gets Ready for a Video Upgrade", Wikipedia will see dramatic improvements in video capabilities rolled out within the next few months.
On the Commons-l mailing list, Casey Brown described the article:
They just put together all of the mini-updates about Michael Dale/Kaltura's
work that we've been getting for months now.
- http://wikimediafoundation.org/wiki/Collaborative_Video
- http://blog.wikimedia.org/2008/07/23/kaltura-sponsors-michael-dale-open-source-video-developer/
- http://metavid.org/blog/2009/03/27/add-media-wizard-and-firefogg-on-test-wikimediaorg/
- http://techblog.wikimedia.org/author/mdale/
- public svn
- lots of wikitech-l/commons-l/foundation-l notes
The article just put all the snippets together into a solid update for people outside our community. :-)
Milestones
- The Russian Wikipedia has reached 400,000 articles (Страсти по Матфею (Бах)) (St Matthew Passion).
- The Belarusian (Taraškievica orthography) Wikipedia has reached 20,000 articles with page be-x-old:Казімер Нарбут.
- The Turkmen Wiktionary has reached 2,000 entries.
- The Occitan Wiktionary has reached 10,000 entries.
- The Esperanto Wikipedia has reached 20,000 registered users.
Reader comments
Wikipedia impacts town's reputation, assorted blogging
Palmerston North entry dissuades overseas professionals
According to New Zealand's stuff.co.nz, overseas investors and doctors have been shying away from Palmerston North because its Wikipedia article described it as being a particularly crime-prone area, with a particular emphasis on gang violence. MidCentral Health consultant Christine Wood described how doctors from Israel and Germany declined to work in Palmerston North after reading the Wikipedia entry. The inclusion of the crime section was criticized because of the lack of such a section in other New Zealand entries, such as Auckland and Hamilton. Palmerston North's city council responded by "toning down" the section.
In the blogosphere
- The discussion about paid editing (which is now focused on the proposed guideline Wikipedia:Paid editing) was picked up this week by the CNN SciTechBlog: "Wikipedia editors for hire".
- Andrew Lih was featured in a Q&A on the New York Times Freakonomics blog: "By a Bunch of Nobodies: A Q&A With the Author of The Wikipedia Revolution".
- On his blog The Wikipedian, William Beutler follows up with the artist who created a comically thick print edition of Wikipedia: "Wikipedia on Dead Tree Redux".
- Search engine optimizer Michael Gray raised the hackles of the Wikipedia community with a post on "How to Invalidate Wikipedia Articles"
Reader comments
Discussion Reports And Miscellaneous Articulations
The following is a brief overview of discussions taking place on the English Wikipedia and other Wikimedia projects.
Note: Starting with this issue, a notice will be placed next to items which have been added since the last issue, for easier locating of discussions which you may not have known about.
Policy
New! Request for comment: Self electing groups: Should "unofficial" electable groups of Wikipedians be allowed?
- Request for comment: Paid editing: How should paid editing be handled? Is it perfectly allowed? Or is it a blockable action? Something in the middle? (See related story)
- Request for comment: Notability and fiction: How should notability work with regards to fiction? Numerous possible proposals have been put forth.
Style
New! Request for comment: Full-date unlinking bot: Should a bot be allowed to unlink dates under this proposal? Specifically, unlinking only full dates with day, month, and year information, and not editing the same page twice to do so in case the edit is reverted? So far the community seems supportive of this proposal.
- Request for comment: Should the relatively new template, {{italic title}}, be used to italicize names? If so, what articles should it be used for?
- Discussion: Should guidelines be adopted for what order talk page templates should be sorted in? There is a draft of the proposed guideline.
- Request for comment: Should a bot "fix" section levels when they are skipped (e.g., changing a level 2 header followed by a level 4 header to being a level 2 header followed by a level 3 header). Currently 20 supporters and 9 opposers.
Technical
Open bot requests for approval
This is a list of current bot requests for approval, with brief descriptions of the proposed tasks. See this week's technology report for information on recently-approved bots.
New! AnomieBOT 31: To move {{translated page}} from articles to talk pages.
- Coreva-Bot 2: To add maintenance tags to articles.
- CSDCheckBot: To notify users who tagged an article for speedy deletion if that article was not deleted or deleted under a different criteria from what they selected.
New! DrilBot 3: To tag image files where the image license migration would be redundant.
- Erik9bot 9: To tag articles with {{unreferenced}} if it can't find any evidence of references.
- Erwin85Bot 8: To notify major article contributors when an article is nominated for deletion.
New! MondalorBot: To cleanup interwikis and rename categories.
- NKbot 2: To delete pages in Category:Temporary Wikipedian userpages.
- NNBot II: To enforce rules regarding the removal of speedy deletion templates.
- OgreBot: To update sports scores on a regular basis.
- SPCUClerkbot 3: To modify all existing checkuser/sockpuppet related templates to one single template: {{sockpuppet}} with correctly named parameters.
- Thehelpfulbot 9: To recategorize pages when a category is "moved".
- UnitBot: To fix articles that describe unit conversion to "a ridiculous degree of precision"
Other
Open requests for adminship
The following requests for adminship are currently open (numbers indicate support/oppose/neutral voting, and are updated every half hour):
New! Cool3 4: Final (55/7/1); closed by Rlevse at 17:57, 27 June 2009 (UTC).
New! Jarry1250: Final (77/2/1); closed by EVula at 16:33, 24 June 2009 (UTC).
New! Patar knight: Final (52/7/2); closed by Kingturtle at 3:11, 28 June 2009 (UTC).
New! Plastikspork: Final (52/7/6); closed by bibliomaniac15 at 22:39, 25 June 2009 (UTC).
New! Timmeh 2: Final (55/37/10); withdrawn by candidate.
New! Wtmitchell: Final (65/1/4); closed by Rlevse at 12:13, 26 June 2009 (UTC).
Reader comments
Approved this week
Administrators
Two editors were granted admin status via the Requests for Adminship process this week: Ched Davis (nom) and Mazca (nom).
Bots
This section is now included in the Technology Report, and contains an expanded description of the bots that have been approved. This week's article.
Featured pages
Eighteen articles were promoted to featured status this week: Moltke class battlecruiser (nom), Ten Commandments in Roman Catholicism (nom), Hastings Ismay, 1st Baron Ismay (nom), Ice hockey at the Olympic Games (nom), Jarome Iginla (nom), Yamato class battleship (nom), Magnetosphere of Jupiter (nom), Albert Bridge, London (nom), BP Pedestrian Bridge (nom), Abu Nidal (nom), Brazilian battleship Minas Geraes (nom), Fantasy Black Channel (nom), Otto Becher (nom), Bill Ponsford (nom), Early life of Keith Miller (nom), On the Origin of Species (nom), Battle of the Coral Sea (nom) and John Douglas (architect) (nom).
Seven lists were promoted to featured status this week: List of members of the International Ice Hockey Federation (nom), The Simpsons (season 14) (nom), List of Mexican National Trios Champions (nom), Rawlings Gold Glove Award (nom), List of Philippine–American War Medal of Honor recipients (nom), Commandant of the Marine Corps (nom) and List of United States Military Academy alumni (engineers) (nom).
One topic was promoted to featured status this week: Towns in Trafford (nom).
One portal was promoted to featured status this week: Portal:Connecticut (nom).
The following featured articles were displayed on the Main Page this week as Today's featured article: Richmond Bridge, Euclidean algorithm, Akutan Zero, In Utero, Iridium, Emily Dickinson.
Former featured pages
No articles were delisted this week.
Two lists were delisted this week: List of mergers and acquisitions by Expedia (nom) and List of mergers and acquisitions by Dell (nom).
One topic was delisted this week: Numbered highways in Amenia (CDP), New York (nom).
Featured media
The following featured pictures were displayed on the Main Page this week as picture of the day: Seven Rila Lakes, Gerald Ford, Map by Pedro Reinel, Arborist, Common Grass Blue, Lunar Lander Challenge and Leucospermum.
No featured sounds were promoted this week.
One featured picture was demoted this week: Cathédrale de Nantes (nom).
Twelve pictures were promoted to featured status this week and are shown below.
Reader comments
Bugs, Repairs, and Internal Operational News
This is a summary of recent technology and site configuration changes that affect the English Wikipedia. Please note that some bug fixes or new features described below have not yet gone live as of press time; the English Wikipedia is currently running version 1.44.0-wmf.8 (f08e6b3), and changes to the software with a version number higher than that will not yet be active. Configuration changes and changes to interface messages, however, become active immediately.
Bots approved
4 bots or bot tasks were approved for operation this week. These were:
- AnomieBOT 32, for the creation of isotope-based redirects;
- Mr.Z-bot 7, to report certain abuse filter violations to the administrator intervention against vandalism noticeboard;
- Kwjbot 3, to fix double redirects.
- EdwardsBot, to deliver newsletters for Amazing Race Wikipedia.
This week's discussion report contains information on current bot requests and related discussions.
Bug fixes
- The API no longer flags pre-April 2008 edits, retrieved using
list=usercontribs
, as new edits. (r52096, bug:19271)
New features
- MediaWiki:Sp-contributions-footer-anon has been added for all IPs, when viewing Special:Contributions, rather than appear only when the IP has made edits. (r52174, bug:19294)
User
andexcludeuser
have been added to the API forlist=recentchanges
andlist=watchlist
. (r52152, bug:14200)- The API now returns a HTTP 503 status code for maxlag errors. (r52190)
Other news
- The Wikimedia Foundation has announced that the Amsterdam-based data center provider EvoSwitch will be providing bandwidth and hosting services — 300,000 euros of in-kind support — to the Foundation, with their center serving as a HUB for Europe. The sponsorship will allow the Foundation to add new caching servers at the Amsterdam data center. [1] [2]
Reader comments
The Report on Lengthy Litigation
The Arbitration Committee this week announced that there will be another Checkuser and Oversight Election in August, and outlined a schedule for the election.
The Arbitration Committee opened no cases and closed one this week, leaving four open.
Evidence phase
- Seeyou: A case examining the conduct of user Seeyou.
- ADHD: A case examining the dispute on the ADHD article and the conduct of the editors involved therein.
Voting
- A Man In Black: A case brought to examine the conduct of administrator A Man In Black.
- Mattisse: A case, brought when a recent Request for Comment failed to abate concerns regarding her behavior, examining the conduct of User:Mattisse.
Closed
- Obama articles: The Committee mandated that "a group of involved and non-involved editors and administrators" will review the current article probation on Barack Obama and report on its effectiveness, with recommendations for the future. Several editors were admonished for their behavior and placed under editing restrictions. A full summary is available here.
Reader comments