Wiki Loops: Visualizing Patterns in Wikipedia Links

Jamie Green

What’s the difference between NASCAR and philosophy? According to xkcd’s Randall Munroe, only 5 Wikipedia pages.

Randall Munroe, a former NASA astronaut-turned-web cartoonist, is best known for his humorous takes on subjects ranging from math and science to love and poetry.  His primary outlet, www.xkcd.com, pulls in an estimated 2.8 million views per day.

Back in 2011, he set his comedic sights on Wikipedia, highlighting how reliant we’ve all become on the crowd-sourced encyclopedia for our knowledge about even the most basic things. The comic itself doesn’t stand out as particularly noteworthy, but with every comic, Mr. Munroe includes a “hover-text”—hover your mouse over the images on his site, and you’ll be rewarded with an additional joke or anecdote. In this case, we can see the following:

wiki loops 1
Fig. 1: https://www.xkcd.com/903 “Extended Mind,” the hover-text of which inspired this blog post. These comics are made free for non-commercial use via a Creative Commons Attribution-NonCommercial 2.5 License.

“Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at ‘Philosophy’.”                   

When I first read this, of course I was intrigued. I immediately opened Wikipedia in a new browser window and began to test. I opened a page that certainly couldn’t have any connection to Philosophy—“NASCAR”—and began to follow his instructions.

If I wanted to prove Mr. Munroe wrong, I did not get off to a good start. The first non-italicized, non-parenthetical link on the NASCAR page is “business.” Business leads to Entity, Entity leads to Existence, Existence leads to Ontology, and sure enough, Ontology connects to Philosophy.

What Randall Munroe discovered (along with millions of his readers shortly afterwards) is a phenomenon known as “wiki loops.” A wiki loop occurs when, by following the rule of “click the first link in a Wikipedia article not in parentheses or italics, and repeat,” you find yourself coming back to the same sequence of entries over and over, ad infinitum. In the case of Philosophy, it’s actually part of a much larger loop:

Philosophy > Pre-Socratic Philosophy > Ancient Greek Philosophy > Hellenistic Period > Ancient Greece > Civilization > Complex society > Anthropology > Human > Homo > Genus > Taxonomy (biology) > Science > Knowledge > Awareness > Consciousness > Quality (philosophy) > Philosophy

As we can see, Munroe’s choice of “Philosophy” was at least somewhat arbitrary—if every page leads to Philosophy, it also leads to Pre-Socratic Philosophy, or to Human, or to Knowledge, etc.

At this point, you are probably feeling a little dubious. How could it be possible that every link eventually connects to this wiki loop? Surely somewhere you’ll find another such loop that will link back to itself, never reaching one of the seventeen pages in the Philosophy loop? In this case, your intuition is correct. While many common topics on Wikipedia tend to relate to something that will connect to these categories (think about how many pages could quickly relate to humans, biology, science, or civilizations), there are certainly others out there.

A small example is the wiki loop of Coffee > Coffee Preparation > Coffee, which you can get trapped in if you begin with, say, “Espresso.” This loop, at least to me, feels somehow unsatisfying – it seems the only way to be stuck in it is to be in the world of coffee already, whereas reaching Philosophy can happen from seemingly anywhere (“James Bond” hits the loop after fourteen clicks, “Melinda Gates” after six, and “Bell Bottoms” after a measly four clicks).

I wanted to know more. First, what other loops can we find? Second, how likely are we to get into the Philosophy loop compared with other wiki loops?

For answering these questions, we turn to an incredibly helpful (and incredibly fun) resource: WikiLoopr. Plug in any starting Wikipedia page, and it will do the hard work for you. (Special thanks to Northwestern alumnus Sean Gransee ’14 for making the page available!) I played around with this for a little while with various inputs. Some interesting patterns: “President of the United States” ends on Philosophy, but “List of Presidents of the United States” gets stuck on “United States Constitution.” Until recently, all of the presidential hopefuls’ pages led to Philosophy except for Donald Trump’s page; this changed when he became the presumptive nominee.  One particularly appropriate loop: “Narcissism” leads to “Vanity” and then back to “Narcissism.” As fun as this exploration was, I needed a dataset, so I looked toward automation.

I wrote a Python script that automated going to WikiLoopr with a random Wikipedia page (luckily, Wikipedia had my back with a page randomizer). Using the “selenium” package to load javascript objects, “beautifulsoup4” to parse and read the html, and the standard “re” package for regular expressions, I collected the results from 1,000 rounds of wiki looping. In order to visualize the results in R, I turned them into a network (using libraries “igraph,” “GGally,” and “ggplot2”), and…:

wiki loops 2
Fig. 2: Network of all sampled Wikipedia websites, colored by which wiki loop they end up in

…The results weren’t terribly surprising. Mr. Munroe was essentially right. Of the 1,000 starting nodes, 981 ended in the Philosophy loop, suggesting that on average you have a 98% chance of ending on that loop if you’re picking random starting points. Of the 2,996 Wikipedia pages visited by my script (including intermediate steps), 2,928 – or 97.7% – of the pages ended up leading to the Philosophy loop.

In terms of basic graph theory, all points that lead to the same loop form a “component” – the maximal set of nodes (pages) in which all nodes are indirectly connected to each other. Of all the pages visited, only 68 websites (including intermediate steps and the loops themselves) were parts of other components. These components are shown above as colored nodes, whereas the main component is represented in black. The resulting graph highlights just how dominant the Philosophy loop really is—with almost every node in black, the colored (non-Philosophy loop) nodes are quite literally edge cases.

Okay, so far we’ve succeeded in confirming the general rule that you should expect to find yourself at Philosophy. We can also see what some of the other loops are by visualizing just the loops themselves, without all the points that lead to them:

Fig. 3: Network representation of only the Wikipedia pages that are part of wiki loops
Fig. 3: Network representation of only the Wikipedia pages that are part of wiki loops

Now we’re getting somewhere. There are two major differences between the Philosophy loop and all the others. First, the sizes of the loops themselves: the Philosophy loop has seventeen nodes in total; no other loop has more than three. It makes logical sense that with so many more ways to get stuck in the loop, it would be likely to catch more than smaller loops.

Secondly, the majority of the terms in the Philosophy loop are incredibly general. It has words like “Civilization,” “Knowledge,” “Science,” “Genus,” and of course, “Philosophy.” On the other hand, we have loops like Cebuana Lhuillier > Philippe Jones Lhuillier > Jean Henri Lhuillier (a Philippine-based pawn shop  chain, its current CEO Philippe, and Philippe’s father Jean, who was the former chairman for Cebuana Lhuillier). These are incredibly specific. We can make the reasonable assumption that one of the keys to a loop’s success is having general terms, especially ones that involve entire branches of human cognition.

If a thousand starting seeds shows us eighteen loops, then it stands to reason that increasing the number of seeds can find a number of other hidden loops. Wikipedia has over 5 million articles in the English language alone, and it only takes two articles to form a loop. It would be cool to see if a larger loop can be found with a larger sample size. In case anyone else out there is as curious as I am, I’ve posted my code for data scraping and visualizing on my github account. However, until proven otherwise, it seems fair to say that with few exceptions, all roads lead to philosophy.

No Comments

Post a Comment