Why Wikipedia’s “Editor Exodus” Doesn’t Matter: A Closer Look at the Data

30 11 2009

On November 23, the Wall Street Journal reported that “unprecedented” numbers of Wikipedians were leaving the site and suggested that Wikipedia was in grave danger.  The story was later picked up by CNet and other sources.

Wikimedia produced a rebuttal saying that everything was fine.  However, the media is portraying the battle as a “he said / she said” affair where it is not clear whether Wikipedia is in trouble or not.  We think this fails to credit Wikipedia as holding the stronger of the two positions.

In this article, we discuss why Wikipedia’s “editor exodus” doesn’t matter.  Simply put, the WSJ has (1) interpreted the data incorrectly, (2) ignored the fact that most content on Wikipedia is created by a small, core group of editors [a fact validated by researchers at U. Minnesota], (3) ignored social science research indicating that having fewer editors may increase article quality, and (4) ignored the exodus of spammers.

The exodus described is the leaving of 49,000 editors from the English language version during the first three months of 2009, where only 4,900 left in the same quarter a year earlier.  It is stated that editors are leaving faster than new editors are joining.  All the statements made in the WSJ article are supported only by data provided by Dr. Felipe Ortega of the Universidad Rey San Carlos in Madrid.  The data is in his thesis on Wikipedia (the relevant data starts on page 116).

Ortega’s Analysis

Dr. Ortega shows that more editors are leaving Wikipedia than joining it, as shown by the greater number of editor deaths than births in the following graph.  He defines a birth as the moment when someone edits Wikipedia for the first time.  He defines a death as the moment when someone makes their last edit to Wikipedia, which is not followed by any further edits.

The 49,000 “log-offs” reported by the WSJ are what Dr. Ortega would call in his thesis 49,000 deaths.

The following image is the birth/death graph for the English version of Wikipedia from page 123 of Dr. Ortega’s thesis.

Problem 1: Ortega’s Basic Errors

The first thing to point out is that, although true that deaths outpace births, the gap between deaths and births is tiny compared to the total number of editors joining and leaving the site.  This alone suggests that any predictions of Wikipedia’s demise are premature.

Second of all, the graph shows that deaths outpace births in recent years and not in past years, which is probably why the WSJ pounced on this topic as something new.  However, it is possible that the entire effect is only due to a methodological error.  More recent years on the graph will tend to overcount deaths because there has been no opportunity for the editors to come back and prove they are not dead.  In older years, an editor may have “died” only to come back 2 years later and therefore not be counted as dead.  For example: An editor that “logged off” in 2006 and came back in 2008 is not counted as dead because it is seen that he is still editing; however, an editor that “logged off” in 2008 and will actually come back in 2010 is counted as dead because his edits in 2010 haven’t happened yet.

Third, Dr. Ortega’s data is limited to birth / death data, and he does no analysis on the number of active editors in a given month.  By Wikipedia’s count, the number of active editors has been constant for the past year.  The number of active editors in a given month would seem to be a far more direct and important measure of Wikipedia’s success than the birth / death rate.

Problem 2: Top Contributors do Most of the Work

Besides the methodological problems with Dr. Ortega’s analysis, there is another considerable problem, which is that many of the 49,000 editors that left were users who had only made 1 – 4 edits.  The vast majority of content on Wikipedia is created by users who repeatedly make many edits (the top contributors).  Wikipedia actually defines an editor as someone who makes 5 edits or more.  The definition makes sense because many people make a few experimental edits and leave.  It is senseless to judge the strength of Wikipedia’s community based on these people.

Reid Priedhosrky and other researchers at the University of Minnesota have actually measured the amount of content created by the top contributors versus that created by lesser contributors.  Although Wikipedia is touted as a “democratic” source of knowledge, the empirical fact is that Wikipedia is mainly written by a small number of users.

Priedhorsky measured the amount of content created by different editors using what he calls persistent word views (PWV), which is just a fancy way of counting the number of times words written by an editor are shown to Wikipedia readers.  An editor gets 1 PWV for each word that he wrote that appears on someone’s screen.  If an editor writes the word “cat” on a page that is viewed by 10 people and “dog” on a page that is viewed by 5 people, then that editor has 15 PWVs.

The following graph shows how many PWVs different groups of editors have.  The graph goes up to 100%.  If a group of editors had 100% of the PWVs, this would mean they were responsible for every word of Wikipedia ever viewed by any reader.  The graph shows the amount of PWVs attributable to the top 10% of contributors (the editors who are the top 10% most active), the top 1% of contributors, and the top 0.1% of contributors.

In the words of Priedhosrky: “Editors who edit many times dominate what people see when they visit Wikipedia. The top 10% of editors by number of edits contributed 86% of the PWVs, and top 0.1% contributed 44% – nearly half! The domination of these very top contributors is increasing over time.”

Therefore, though losing people who make one or two edits hinders Wikipedia’s goal of democratization of knowledge, in the sense of allowing everyone to contribute to their vast collection of articles, it doesn’t really affect the amount of content Wikipedia has or the rate at which this content grows.

The vast majority of Dr. Ortega’s 49,000 dead users are not even in the top 10% of contributors.  So any loss of their contribution is limited to 100 – 86 = 14% of Wikipedia’s PVWs.

Dr. Ortega does take the time to separately discuss how often editors in the top 10% remain alive.  His data shows that only 30% of editors in the top 10% of contributors remain a top 10% contributor after the passage of 500 days.  However, for the editors leaving the top contributor group, 40% of the authors remain alive (making fewer edits) for another 500 days.  This data does suggest a fairly heavy turnover among top 10% contributors but nothing terribly bad.  After all, 500 days is 1.3 years and 500 days + 500 days is 2.7 years.  These are pretty long periods of time to be active on any website.

More importantly, Dr. Ortega says nothing about the death rate of the top 1% users and the top 0.1% users, who account for over 66% of Wikipedia as measured in PWVs.  In the absence of data, there is no reason to suspect that these most important editors are leaving.

Problem 3: Fewer Editors Often Correlates with Higher Quality

Studies have shown that having fewer editors on a Wikipedia article can increase its quality on certain measures, such as its readability and coherence.  Intuitively, having many editors can lead to edit wars, lack of coordination, and lack of agreement about an article’s direction.

In fact, most Wikipedia articles are written by a small group of editors.  Jimmy Wales, co-founder of Wikipedia, told CNet: “One of the things that’s important to know about Wikipedia is that the entries that are edited by hundreds of people are really anomalies.”

Outdated Idea: More Editors = Higher Quality

There was briefly a time when social scientists believed that Wikipedia articles with more editors were higher quality.  This idea was espoused by Wilkinson and Huberman of HP Labs.  The following graph shows that featured articles on Wikipedia tend to have more editors than regular articles.  Because articles become featured because of their high quality, the researchers thought that this indicated that having more editors caused an increase in quality.

The red line on the graph shows featured articles and the black line shows regular articles.  As you can see, the featured articles have more editors.  The bars sticking out from the lines just show standard deviation, which you can ignore.  The horizontal axis shows Google page rank.  The editors wanted to only compare featured articles and regular articles of the same page rank because the page rank could make an article more visible and attract more editors.

The Wilkinson and Huberman idea has fallen out of favor.  The fatal flaw of the study was that it did not control for number editors before and after the article became featured.  Clearly, an article that becomes featured will draw more editors.  Therefore, it is just as likely that the featured status causes the article to have more editors than for the higher number of editors to cause the featured status.

Modern Idea: More Editors = Lower Quality (sometimes)

The Wilkinson and Huberman study has been supplanted by newer studies showing that having fewer editors can sometimes result in higher quality Wikipedia articles.

Dr. Niki Kittur and Prof. Robert Kraut of Carnegie Mellon University demonstrated that an important factor in whether increasing the number of editors actually helped a Wikipedia article was the concentration of edits.  An article with high concentration of edits is one where a few editors do most of the work, and other editors make only minor additions.  An article with low concentration is one where the editing work is spread evenly among editors.

High concentration suggests that a few editors are taking the lead on the article, whereas low concentration is more like the democratization idea, where everyone does an equal share.

They published the following graph showing that an article with low concentration decreases in quality as more editors are added.

The next graph shows that articles with low concentration have more interdependent issues when more editors are added.  An interdependent issue was defined by them as something that is not easy for the editors to resolve independently.  Interdependent issues are things like readability, flow, and coherence.

 

The researchers suggest that the reason having many editors with a low concentration decreases article quality is that it is hard to coordinate when there are so many individuals who each have so much editing power.

Problem 4: Loss of Spammers

As noted earlier, Dr. Ortega’s 49,000 dead users are mostly low-activity editors.  Commentors nbauman and KlaymenDK pointed out on Slashdot that the loss of low-activity editors might be explained by the decline in spammers after Wikipedia added the no-follow tag to all of its outbound links.  The no-follow tag means that the link is not used for computing page rank, so spammers have less incentive to add links to their own site.  Spammers would likely be low-activity editors because they add an outbound link from Wikipedia to their own site and then leave.

This is a clear case where losing these editors is a win for Wikipedia.

The 49,000 dead users could very well have been caused by Wikipedia’s crack down on spammers, but unfortunately, there is no empirical analysis available to support or deny this hypothesis.  Dr. Ortega did not analyze how many of the “logged off” editors were spammers.  This could easily have done by determining whether the editors had posted mostly external links.

Conclusion

We do not doubt Dr. Ortega’s expertise on Wikipedia.  After all he wrote a 200 page thesis about it.  Nor do we doubt that Wikipedia could improve its user experience and make the system friendlier for newbies to reduce the number of user deaths.

However, the data does not support the conclusion that Wikipedia is about to meet its demise or even that its user base has been affected in any substantial way for the reasons described above: (1) methodological problems with Dr. Ortega’s analysis, (2) the vast majority of content on Wikipedia is created by repeat contributors who are not leaving, (3) having fewer editors on an article does not necessarily decrease its quality, and (4) many of the editors who left may have been spammers.

Email a Friend

Sources

Kittur, A., Lee, B., and Kraut, R. (2009). Coordination in Collective Intelligence: The Role of Team Structure and Task Interdependence. CHI ’09.

Kittur, A., and Kraut, R. (2008). Harnessing the Wisdom of Crowds in Wikipedia: Quality through Coordination. CSCW ’08.

Ortega, J. (2009). Wikipedia: A Quantitative Analysis.  Doctoral Thesis.  Universidad Rey Juan Carlos

Priedhorsky, R., Chen, J., Lam, S., Panciera, K., Torveen, L., and Riedl, J. (2007). Creating, Destroying, and Restoring Value in Wikipedia.  GROUP ’07.

Wilkinson, D. & Huberman, B. (2007). Cooperation and Quality in Wikipedia. WikiSym ’07.

Images and graphs come from these articles.


Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: