This blog entry continues the entry on “Authors are trumps” and looks at the top fifty play links, which score at the 99.9th percentile of shared n-grams.  What can we learn from this list without actually looking at the plays? Or, if we think about it as an excercise in scalable reading, what can we learn about the plays from just looking at the pairwise combinations that we wouldn’t learn from actually reading the plays?

Here is a link to a list of the fifty play combinations that have the highest linkweight value for shared n-grams. The top three entries are in a class of their own. Jonson’s Fortunate Isles is a short masque that recycles substantial chunks from Neptune’s Triumph, another masque produced in the previous year. The first part of Killigrew’s Thomaso has a mountebank scene that liberally borrows from Volpone.  Shirley’s Contention for honour and riches (1631) and Honoria and Mammon tell you through their titles that they are close cousins.

Ten of the top fifty pairwise combinations come from two-part plays, giving rise to another “duh moment.” But we also observe that six of them have values that sit below the median for shared n-grams within the same play. Repetition declines sharply with distance.

23 or almost half of the outlier values involve same play reptitions by Thomas Killigrew. They also add up to almost half (23/55) of the possible links between Killigrew’s eleven plays.  have never read a play by Killigrew, and I am not sure I ever will. Here may be a case where scalable reading from a distance tells you most of what you want to know. Not only do you get the expected Cicillia I and II or Bellamira I and II, you also get pairwise combinations of Cicillia and Bellamira. We leave Killigrew with the recognition that his plays have a lot of Bellamira, Cicillias, Claracillas, parsons, pilgrims, and prisoners.

Philip Massinger has 14 plays in the EMD corpus. Six of the 91 possible links between his plays show up in this top list. He, too, appears to be an author that likes to repeat himself.

Gascoigne’s Supposes and Glass of Government

Gascoigne’s Supposes and Glass of Government share two heptagrams and  nonograms as well as an octogram. Here they are:

  • from bodily perils in the cradle from danger of
  • this is somewhat yet for by this means I
  • oh that I could tell where to find
  • my mind gives that I shall
  • but be you sure that I shall

These are not especially remarkable phrases, but there are only 1630 repeated n-gram that are longer than six words. The odds are low that five of them will show at random in one of some 50,000 possible play links. A similar point can be made about the 41 tetragrams and pentagrams that are shared betweent he two plays. There are about 150,00 such n-grams in the EMD corpus.  A simulation of their random distribution across all possible play combinations gives this pattern:



You would have to play this game a very long before generating a random result with a play link that shares 41 hits.

Two-Part Plays

Here is a quick tabulation of two-part plays, ignoring Thomas Killigrew:

author playlink linkweight
marl tamb1_tamb2 63.78
heyt 1edw4_2edw4 31.44
sha 2h6_3h6 29.12
heyt ironage1_ironage2 28.86
sha 1h6_2h6 26.74
heyt maidwest1_maidwest2 25.49
sha 1h6_3h6 21.14
sha 1h4_2h4 18.64
dekker honwhore1_honwhore2 14.83
heyt knownotme1_knownotme2 11.21

Unsurprisingly, all these play links share many n-grams. All of them sit above the 99th percentile. Marlowe’s Tamburlaine stands out as a play of unusual repetivity. The two plays do not share many long n-grams. The hexagram “the Turk and his great empress” is the longest, and there are only five pentagrams, including

  • region of the air and
  • the great and mighty Tamburlaine
  • a terror to the world
  • Turke and his great empress
  • and terror of the world
  • fill all the world with
  • the monarch of the east

But there are 33 tetragrams and 56 trigrams that occur only in the two parts of Tamburlaine.


It is instructive to compare this with the second and third parts of Henry VI. There are are more long repetitions, though they are rather bland:

  • what news why comest thou in such
  • out some other chase for I myself
  • the son of Henry the fifth
  • the duke of York is
  • the duke of York I
  • king at nine months old
  • in the heart of France
  • for the duke of York
  • and myself with all the

There are only thirteen shared tetragrams and 35 trigrams. Some basic facts about the distinct rhetorical timbre of Marlowe’s play emerge very clearly from these simple comparisons. The point is underscored by the fact that with the exception of Edward II and the Massacre at Paris, the other same-author values for Marlowe sit below the median. That is the topic for another blog.