Google Summer of Code 2016 is just about over and I am absolutely thrilled to have worked for Wikimedia Foundation under mentor James Salsman and co-mentor Fabian Flock! The application is live and can be accessed here. This post is dedicated towards providing a brief overview of my project.
‘Accuracy Review of Wikipedias in Flask’ or AROWF in short is a peer-review system that helps editors find and review inaccurate content in Wikipedia articles. This has been achieved with the help of algorithms that are specifically designed to flag suspected content in wiki articles, which the editors can then review by a three-way review mechanism. The reviewer bot finds articles in given categories, category trees, and lists. For each such article, the bot creates questions from the following sets:
- Passages with facts and statistics which are likely to have become out of date and have not been updated in a given number of years.
- Passages which are likely unclear.
- Student edits.
- Content from Wikipedia Backlog categories
Question candidates from each set are then ranked by taking into account the pageview count of the article and the high ranking candidates are made into questions. These questions are then open to the reviewers for reviewing and resolving them. A three-way peer-review mechanism ensures that questions are resolved based on common consensus. Two reviewers work on each question, and in the case of a conflict, it is sent to a third reviewer. The first reviewer provides a solution to the question posed. The second reviewer can decide to either ‘Endorse’ or ‘Oppose’ a proposed solution as valid/invalid. In case of a conflict, the third reviewer decides between supporting the first or second reviewer’s viewpoint. An editor then implements the recommended solution. The workflow is represented in the figure below. Reviewer reputation scores are computed based on a mechanism of acceptability of reviews by other peer reviewers. Reviews which lowered the scores can be optionally displayed to the reviewers.
- Designed the architecture of the system
- NoSQL approach of the data storage system
- Implemented the /ask, /answer, /recommend, /inspect, /register, /token and /help end-points
- Optional registration system
- Logging functionality
- Wrote scripts to create questions from the following:
- Articles containing the word ‘recent’
- Poor Flesch-Kincaid readability scores
- Student edits
- Wikipedia Backlog categories
- Ranked extracted article candidates based on standardized scores that include the pageview count
- Deployed the app on ToolLabs and PythonAnywhere
- Internationalization and Localization: Provide language support for the app
- Beta testing
- Tool Labs instance: https://tools.wmflabs.org/arowf/
- Documentation URL
- GitHub repository
- Phabricator project task
- Phabricator project workboard
- Project Blog
Primary mentor: James Salsman
Co-mentor: Fabian Flock