Submissions/Wikistats Status Report

This is an accepted submission for Wikimania 2012.


Submission no.

115

Title of the submission
Editor profiling and matching based on userbox categories
Type of submission (workshop, tutorial, panel, presentation)
presentation
Author of the submission
Erik Zachte
E-mail address
ezachte at wikimedia dot org
Username
Erik Zachte
Country of origin
Netherlands
Affiliation, if any (organization, company etc.)
Wikimedia Foundation
Personal homepage or blog
http://infodisiac.com/blog
Abstract (at least 300 words to describe your proposal)

Editor userboxes are since long ingrained into Wikimedia culture. For many editors they are a preferential route to personalize one's online identity. They are succinct, easily recognized, often playful. They are one of the oldest and arguably most popular 'social features' of Wikimedia software, and predate similar features on social sites, like Facebook.

Through its categorization scheme Mediawiki software makes it very easy to find other users on e.g. the English Wikipedia who have 'an understanding of the Han script' (352 total, of which 62 on advanced level or better). Some types of userboxes (notably Babel boxes) have a refined sub categorization scheme.

Given its rich content surprisingly little has been done so far to mine this content and use it for profiling, statistics and perhaps (opt-in ?) match making.

Part of the technical challenge is to mine these boxes in a language independent manner. The categorization scheme can be very helpful here.

Part of the social challenge is to not embarass user with findings. User privacy may be an issue here, if not explicitly (all data is by its very nature openly available), perhaps by unintended consequence: aggegration and filtering could be used to single out a person or small group, and lead to exposure beyond the original intention of the user.

Nonetheless presumably many users who announce their skills and/or interests on their user page would be happy to learn which other users have a similar profile. This could further cooperation and bonding within the community.

Some 'fuzzy matching' based on proximity in the category tree could help to find corresponding skills which are similar but not equal. For example two users with great interest in languages may not have exactly matching babel boxes because their advertised skill levels per language are different. Match making could consist of (opt-in) recommendations to visit the user page of someon else with similar interests. A prudent approach would be to not make these mined data user searchable, and not explicity expose which matches were found. Perhaps a more controversial approach would be to make these mined data available to a public search engine.

I will show early results based on data mining of these boxes.

Track
WikiCulture and Community; Research, Analysis, and Education
Length of presentation/talk
25 Minutes
Will you attend Wikimania if your submission is not accepted?
yes
Slides or further information (optional)
later
Special request as to time of presentations


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Stu (talk) 18:00, 14 March 2012 (UTC)[reply]
  2. Carolmooredc (talk) 14:38, 16 March 2012 (UTC)[reply]
  3. Zellfaze (talk) 16:13, 19 March 2012 (UTC)[reply]
  4. CT Cooper · talk 17:59, 19 March 2012 (UTC)[reply]
  5. SarahStierch (talk) 21:28, 21 March 2012 (UTC)[reply]
  6. Eloquence (talk) 00:34, 22 March 2012 (UTC)[reply]
  7. Rangilo Gujarati (talk) 13:56, 26 March 2012 (UTC)[reply]
  8. I have hardly any userboxes on my user page, but this certainly sounds like a new way of approaching them! Graham87 (talk) 10:38, 31 March 2012 (UTC)[reply]
  9. OrenBochman (talk) 20:53, 26 May 2012 (UTC)[reply]
  10. Add your username here.