Shakespeare’s dislegomena are lemmata that occur in only two of his plays. I use ‘dislegomenon’ in a specialized sense to refer to document rather than collection frequency.  For instance, the lemma ‘Laertes’ occurs once in Titus Andronicus and 33 times in Hamlet.  In Titus Androniucs, the name occurs in the context of disputed burial, and Odysseus, periphrastically named “wise Laertes’ son” is seen as a model  of pietas and implicitly filial obligation. The earlier passage tells us something about Shakespeare’s choice of name  in Hamlet. The unique link beween Titus Andronicus and Hamlet established by the word is more important than its overall frequency.

There are 2160 Shakespearean lemmata that occur in two plays. They add up to 12.5% of the 17,153 lemmata that are found in the plays.  What can we learn from looking at their distribution over the Shakespeare corpus? Expect some “duh-moments” in what follows. Their frequency legitimates the ocassional “oh-moment.” If quantitative analysis often merely tells us what we already know, it strengthens our confidence that there may be something to outliers or otherwise surprising results.

The 37 plays in my Shakespeare corpus  create 666 pairwise combinations according to the formula (n-1)*n/2) or 36*18.5. Each pairwise combination is associated with a time lapse, and so is each dislegomenon. Thus ‘ween’, which occurs in 1Henry VI (1589) and Henry VIII (1612) has a time lapse of 23 years, while ‘voluptuousness’ occurring in Macbeth (1606) and Antony and Cleopatra (1606) has a time lapse of 0 years (For the purposes of this analysis each play has been assigned to one year, but while particular dates  may be off by +/1, the overall analysis is unlikely to be affected by such errors).

In a first experiment we look at the distribution of dislegomena by time lapse. The blue line in the chart below shows the actual distribution. The red line is a random distribution created by putting the time lapse values of each pairwise combination in a hypothetical “urn” and drawing 2160 samples with replacement. The result is a “duh-moment”: dislegomena occur five times as often within a three-year span than one would expect in a random distribution.

 

NewImage

 

In a second experiment we look at the distribution of dislegomena across the 666 possible pairwise combinations. Shakespeare’s plays differ considerably in length, ranging from 14,365 (Comedy of Errors) to 29,530 (Hamlet). Thus we need to normalize raw counts. We do so by adding the word counts of the two plays in each pairwise combination and expressing the frequency of dislegomena as frequency per 10,000 words.

The histogram of their distribution shows that dislegomena are relatively rare phenomena. The interquartile range of 0.28 to 0.97 translates into raw counts of between one and five shared dislegomena per play.

NewImage

The z-score is a simple and abstract statistic that subtracts the average from the actual value and divides the result by the standard deviations. In a normal distribution you expect the 5th and 95th percentile values to sit two standard deviations below or above the average, and in terms of human size those values stand for “rather short” or “quite tall.”

A look at pairwise combinations with z-scores over two leads to another “duh-moment:”

NewImage

The duh-moment arises from the fact that four of the top ten plays are two-part history plays, and another three are joined through the characters of Falstaff or Hal.  But a little “oh-moment” arises from the observation that a handful of plays are as closely related as if they were two parts of the same play. Temporal proximity may account for some combinations, e.g. The Taming of the Shrew and Love’s Labor’s Lost. On the other hand, many other combinations with equally low time lapses have much lower z-scores. The combination of Hamlet and Cymbeline stands out and invites some detailed investigation. Even more striking is the fact that Hamlet appears more often in these combinations than any other play. Length cannot be the reason because we have at least partially filtered out through normalized frequencies.

 It will take future blogs to look in detail at some dislegomena. In the meantime this survey may help in defining the quantitative parameters of the remarkable.