I’m very lucky to work only a short walk from MIT. A few days ago my friend Goshe, who works over at the HBS library, emailed me about a presentation at the Center for Collective Intelligence. The topic was Visualizing Wikipedia and was being conducted by Fernada Viegas and Martin Wattenberg, both of the IBM Visual Communications Lab.
The pair have been studying Wikipedia for the past four years; and over that time have been struck by three things:
- The fact that it has survived at all
- The scale of individual prodcution/contribution
- The emergence of a bureaucracy
Back in 2003, there were people who doubted that Wikipedia would survive. Some thought that it would simply sink into becoming the online equivalent of a men’s room wall – covered with graffiti and regularly vandalized. But even then people were contributing well, formally and thoughtfully. The pair was curious to see how articles had evolved, expanded and been edited over time. The sheer volume of data made this difficult. So they developed HistoryFlow, a visual tool for displaying and analyzing the history of a document.
HistoryFlow creates on line for each version of an article. Each author is represented by a color and the length of the line indicates the length of the aricle. By displaying a series of these lines, an image of the creation and revision history by author can be created for an article. Additional tools show the degree to which content has remained constant over time. One thing that they found was that fixes happen very quickly – often only minutes after questionable content has been added. Another is that people tend to view and edit specific sections or paragraphs of an article rather than rearranging the overall article. (This, they thought, may be due in part to the limitations of the editing tools.)
As they analyzed the date, they noted two interesting things. The first was that there were patterns at all (edit wars, for example, can be clearly seen as a sharp and rapid set of peaks and valleys). The second was that when they showed the patterns to people unfamiliar with Wikipedia they quickly got a sense of the site and the community.
The most prolific producer made more than 122,387 edits over a two year period. That works out to one every ten minutes. Given the type of data volume this represent, analysis was again a challenge. The team came up with a novel approach to abstract the information. They looked at the first word of every revision and assigned it a hue based on the first letter, a brightness based on the second letter and a saturation based on the third letter (grayscale was used for anything starting with a number). While this approach destroys the conventional meaning of the text it unlocks information that was otherwise unavailable.
(Unfortunately, I do not have an example of one of the patterns generated by this system.)
The patterns created by this technique exposed a great deal of information about editing behavior over time. Not only on the types of articles people edit but also the editing practices they use and how their participation changes. The presence of robotic editors was also obvious in specific editing patterns.
Robots aside, the question came up of how someone could practically make 100,000-plus edits/revisions/decisions over two years and the answer was that they don’t. Project-based decision making occurs instread. Some users spend time organizing work and describing tasks that need to be done while others react and respond to this organization. This points to the third surprise they found – bureaucracy.
The incredible growth and expansion of Wikipedia is hard to imagine. It was illustrated by a HistoryFlow image. For one article, the history of the first two year of study took up only a few inches, The history of the next two years was almost ten times as long. Managing that growth has required that systems and structures emerge – and this has happened organically over time.
When people think about Wikipedia’s growth, the assumption is that it is a result of additional articles and images. While this is true, page counts have also increased in other areas – specifically those concerned with the operation of Wikipedia itself. The visualization used to illustrate this change (which I do not have) also illustrates the evolution of participation that wikipedians go thought over time. They often start by working on the main name space and slowly move to meta discussions topics (those about Wikipedia or its management).
The Talk pages offer a view into the depth and complexity of the topics people are discussing, as well as demonstrate the incredible degree of coordination that allows Wikipedia to function. One example of this coordination is the process by which an article is selected to be a featured article. Without going into detail on the process, the key point is that a workflow has grown up that is not dissimilar to those used by other, more formally structured, organizations.
Another example of the evolving bureaucracy is the expanding and evolving guidelines that describe and define functions and activities on the site. The guidelines are not mere window-dressing but are used and referred to constantly in the various discussion areas around the articles.
All of this demonstrates that self-governance can happen. A big part of this is thanks to the fact that conversations about the site occur on the site and are persistent. This means that a formal body of shared knowledge is available for everyone. It also shows that peer production and happen successfully in a non-hierarcial way.
It was incredibly cool to see how visualization could be used to extract the information that led to these ideas and findings. If you’re interested in trying some visualization tools yourself, check out Many Eyes, a site created by Viegas and Wattenberg.
[tags]MIT, IBM, CCI, Collective Intelligence, Wikipedia, Visualization, IBM Visual Communications Lab, HistoryFlow, Fernada Viegas, Martin Wattenberg, Many Eyes, peer production[/tags]