GSoC 2016 Project Overview

Google Summer of Code 2016 is just about over and I am absolutely thrilled to have worked for Wikimedia Foundation under mentor James Salsman and co-mentor Fabian Flock! The application is live and can be accessed here. This post is dedicated towards providing a brief overview of my project.

Synopsis

‘Accuracy Review of Wikipedias in Flask’ or AROWF in short is a peer-review system that helps editors find and review inaccurate content in Wikipedia articles. This has been achieved with the help of algorithms that are specifically designed to flag suspected content in wiki articles, which the editors can then review by a three-way review mechanism. The reviewer bot finds articles in given categories, category trees, and lists. For each such article, the bot creates questions from the following sets:

  1. Passages with facts and statistics which are likely to have become out of date and have not been updated in a given number of years.
  2. Passages which are likely unclear.
  3. Student edits.
  4. Content from Wikipedia Backlog categories

Question candidates from each set are then ranked by taking into account the pageview count of the article and the high ranking candidates are made into questions. These questions are then open to the reviewers for reviewing and resolving them. A three-way peer-review mechanism ensures that questions are resolved based on common consensus. Two reviewers work on each question, and in the case of a conflict, it is sent to a third reviewer. The first reviewer provides a solution to the question posed. The second reviewer can decide to either ‘Endorse’ or ‘Oppose’ a proposed solution as valid/invalid. In case of a conflict, the third reviewer decides between supporting the first or second reviewer’s viewpoint. An editor then implements the recommended solution. The workflow is represented in the figure below. Reviewer reputation scores are computed based on a mechanism of acceptability of reviews by other peer reviewers. Reviews which lowered the scores can be optionally displayed to the reviewers.

File:Accuracy review.png

Tasks completed

  • Designed the architecture of the system
  • NoSQL approach of the data storage system
  • Implemented the /ask, /answer, /recommend, /inspect, /register, /token and /help end-points
  • Optional registration system
  • Logging functionality
  • Wrote scripts to create questions from the following:
    • Articles containing the word ‘recent’
    • Poor Flesch-Kincaid readability scores
    • Student edits
    • Wikipedia Backlog categories
  • Ranked extracted article candidates based on standardized scores that include the pageview count
  • Deployed the app on ToolLabs and PythonAnywhere

Next steps

  • Internationalization and Localization: Provide language support for the app
  • Beta testing

Quick Links

Tool Labs instance: https://tools.wmflabs.org/arowf/
Documentation URL
GitHub repository
GitHub commits
Phabricator project task
Phabricator project workboard
Project Blog

Mentors

Primary mentor: James Salsman
Co-mentor: Fabian Flock

Advertisements

One comment

  1. Thanks for posting; it would be useful if you used a category or tag for wiki-related things, so that people can easily subscribe (also via https://meta.wikimedia.org/wiki/Planet_Wikimedia ).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: