Shakespeare’s dislegomena are lemmata that occur in only two of his plays. I use ‘dislegomenon’ in a specialized sense to refer to document rather than collection frequency. For instance, the lemma ‘Laertes’ occurs once in Titus Andronicus and 33 times in Hamlet. In *Titus Androniucs*, the name occurs in the context of disputed burial, and Odysseus, periphrastically named “wise Laertes’ son” is seen as a model of *pietas* and implicitly filial obligation. The earlier passage tells us something about Shakespeare’s choice of name in *Hamlet*. The unique link beween *Titus Andronicus* and *Hamlet * established by the word is more important than its overall frequency.

There are 2160 Shakespearean lemmata that occur in two plays. They add up to 12.5% of the 17,153 lemmata that are found in the plays. What can we learn from looking at their distribution over the Shakespeare corpus? Expect some “duh-moments” in what follows. Their frequency legitimates the ocassional “oh-moment.” If quantitative analysis often merely tells us what we already know, it strengthens our confidence that there may be something to outliers or otherwise surprising results.

The 37 plays in my Shakespeare corpus create 666 pairwise combinations according to the formula (n-1)*n/2) or 36*18.5. Each pairwise combination is associated with a time lapse, and so is each dislegomenon. Thus ‘ween’, which occurs in 1Henry VI (1589) and Henry VIII (1612) has a time lapse of 23 years, while ‘voluptuousness’ occurring in Macbeth (1606) and Antony and Cleopatra (1606) has a time lapse of 0 years (For the purposes of this analysis each play has been assigned to one year, but while particular dates may be off by +/1, the overall analysis is unlikely to be affected by such errors).

In a first experiment we look at the distribution of dislegomena by time lapse. The blue line in the chart below shows the actual distribution. The red line is a random distribution created by putting the time lapse values of each pairwise combination in a hypothetical “urn” and drawing 2160 samples with replacement. The result is a “duh-moment”: dislegomena occur five times as often within a three-year span than one would expect in a random distribution.

In a second experiment we look at the distribution of dislegomena across the 666 possible pairwise combinations. Shakespeare’s plays differ considerably in length, ranging from 14,365 (C*omedy of Error*s) to 29,530 (*Hamlet*). Thus we need to normalize raw counts. We do so by adding the word counts of the two plays in each pairwise combination and expressing the frequency of dislegomena as frequency per 10,000 words.

The histogram of their distribution shows that dislegomena are relatively rare phenomena. The interquartile range of 0.28 to 0.97 translates into raw counts of between one and five shared dislegomena per play.

The z-score is a simple and abstract statistic that subtracts the average from the actual value and divides the result by the standard deviations. In a normal distribution you expect the 5th and 95th percentile values to sit two standard deviations below or above the average, and in terms of human size those values stand for “rather short” or “quite tall.”

A look at pairwise combinations with z-scores over two leads to another “duh-moment:”

The duh-moment arises from the fact that four of the top ten plays are two-part history plays, and another three are joined through the characters of Falstaff or Hal. But a little “oh-moment” arises from the observation that a handful of plays are as closely related as if they were two parts of the same play. Temporal proximity may account for some combinations, e.g. *The Taming of the Shrew *and *Love’s Labor’s Lost. *On the other hand, many other combinations with equally low time lapses have much lower z-scores. The combination of *Hamlet* and *Cymbeline* stands out and invites some detailed investigation. Even more striking is the fact that *Hamlet* appears more often in these combinations than any other play. Length cannot be the reason because we have at least partially filtered out through normalized frequencies.

It will take future blogs to look in detail at some dislegomena. In the meantime this survey may help in defining the quantitative parameters of the remarkable.

Thanks very much for this and the analyses above in other posts. These observations are really revealing and illuminating at the same time.

When reading though it occurred to me whether it is important to meditate about the editions that you have used for the analyses. I wonder if you have used digitised early prints, or modern editions, if the latter which editions, if the former which early editions and how you could incorporate into the analysis the variety of spellings of the same word.

Thanks again for the post.