Notes on NUDHL #8: Defining DH @ NU

A great big thanks to Emily, Amanda, Andrew, Kevin, Beth, and all the other HASTAC@NUDHL Scholars for organizing a productive meeting!

A few “takeaways”:

We need to move from talking just about graduate education and DH in an all-purpose way to (1) the different stages of graduate education (1st year vs midpoint vs. home stretch) and DH and (2) where the intersections are among disciplines in graduate training and where it’s good for different disciplines to be just that: different—and then how DH can help to mediate those points of convergence and divergence in productive ways.

To me, the example of Andrew’s research encourages graduate students to (1) go for it with whatever tools you have at your disposal…dive in and explore, dead ends and negative results can be just as productive as brilliantly clear breakthroughs; digital humanities can encompass toying with MS Word or even pen and paper…”Why not?!” (2) think about the digital as both a tool of analysis (screwmeneutics, heuristic, analytic in the literal sense and think about it as a tool for visualization, narration, dramatizing an argument you already have articulated; (3) talk with others, consult, converse, seek new perspectives on your research questions, go deeper into your specialty and explore more broadly across different methods; (4) visualization of a poem, the poem as a visual object, comparative work through the digital, layering versions, and other ways that the digital is quite literally (literature-istically?) fertile, productive of new readings and interpretations.

Josh’s presentation reminded me that there is an emerging network or constellation of people, resources, projects, and interests at Northwestern. The challenge is how to give these better “definition,” more support, more instances of connection and elaboration.

– Michael

Design and the Digital Humanities CFP

Co-organized by our own mighty NUDHL contributor Josh Honn!
Topic: Design and the Digital Humanities

With this year’s M/MLA topic of “Art & Artifice,” the new Permanent Section on Digital Humanities will explore issues of, experiments with, and provocations on design. Digital humanities (DH) is often equated with tool-oriented, procedural tasks like text analysis and data gathering. For example, the recent MLA open access publication Literary Studies in the Digital Age, focuses on textual databases, mining, analysis, and modeling. However, Johanna Drucker, Anne Burdick, Bethany Nowviskie, Tara McPherson, and others have argued that interface and systems design, visual narrative, and graphical display are not peripheral concerns, but rather important “intellectual methods” (Burdick et al. 2012). Likewise, DH projects and publications often segment (content first, design last) and/or outsource (hire a firm, select a template) the design process, overlooking the powerful and important dialectic of design and argument, at times to the great detriment of the project itself. In an effort to further the conversation, we invite papers related to any aspect of design and the digital humanities. Possible topics/questions may include, but are certainly not limited to:

  • design of interactive fiction, hypertext fiction, and electronic literature
  • games and virtual spaces
  • hybrid digital/analog fabrication practices and the ethos of hacking, making, and crafting that surrounds them
  • tensions between original designs and prefabricated templates and visualizations
  • the relationship between content and design in a scholarly edition, web archive, course website, or other digital content management project
  • design and affect, design and imagination
  • the tendency of DH project groups to separate designers and programmers on a team; tendency to divide design concerns from “technical” concerns
  • design standards, web standards, responsive & participatory design, and issues of accessibility of online publications and projects
  • skeuomorphism vs. born-digital design?
  • design and code as language art, code poetry, etc.?

Please send 250-word abstracts by May 31st to both Josh Honn (josh.honn@gmail.com) and Rachael Sullivan (sullivan.rachael@gmail.com).
Co-chairs: Josh Honn (Northwestern University) and Rachael Sullivan (University of Wisconsin-Milwaukee)

Martin Mueller on “Morgenstern’s Spectacles or the Importance of Not-Reading”

X-Posted from Martin Mueller’s Scalable Reading Blog:

[I recently stumbled across the draft of a talk I gave at the University of London in 2008. It strikes me as a still relevant reflection on what then I called “Not-Reading” and now prefer to call “Scalable Reading.” I reprint it below with very minor corrections and additions.]

Coming from Homer: the allographic journey of texts and the query potential of the digital surrogate

For the past decade my work has revolved around what I call the ‘allographic journey of texts’ and the ‘query potential of the digital surrogate’. The stuff I am interested in has been around for a long time. I have written a book about the Iliad, another book about the transformation of Greek tragedy by European poets from 1550 to 1800, and I have written a number of essays about Shakespeare that never quite grew into a book.

None of my earlier work required or benefited from a computer. My first book was typeset on a computer, but the work on it was done in the North Library of the British Museum, taking notes by hand. The copy of my book on the Iliad was prepared on a dedicated word processor of the early eighties. Since the mid eighties I have written everything on a personal computer of some sort. Like everybody else, I don’t see how I could possibly do my work without a computer, but do I really write better as a result? If we had to return to pen and paper would we write worse, or even fewer, books?

On the other hand, Nietzsche, when shown an early typewriter, said “Unser Schreibzeug arbeitet auch an unseren Gedanken mit,” and it seems implausible that tool and content exist independently of each other.  The ‘what’ of a thing and the ‘how’ of its creation and reception are likely to be interwoven at some level.

My interest in technology was at first administratively driven. As chair of my English department in the eighties I took a strong interest in using technology to improve what seemed rather antiquated routines for creating or keeping records. We were the first humanities department to have a dedicated word processor, and later we had the first network that allowed faculty to print their stuff on a Laser printer from either a PC or a Mac. Big stuff in those days — at least in the context of English departments.

My scholarly interest in technology grew out of my work on Homer. What is the relationship between orality and literacy in the creation and transmission of the Iliad and Odyssey? I was from the beginning drawn to, and have never strayed from, the hypothesis that these works are hybrids and that their distinctive features result from the productive encounter of two different technologies of the word, the ‘oral’ and the ‘literate’.

The history of the Homeric poems is an ‘allographic journey’. I take the term ‘allographic’ from Nelson Goodman’s Languages of Art, where he distinguishes between ‘autographic’ works (Michelangelo’s David) and ‘allographic’ works, whether a Shakespeare sonnet or the score of Appassionata. The allographic work can always be written differently, and in theory, the rewriting makes no difference. But in practice, there is some difference if only because an allographic change is likely to involve a change in our mode of access to the work.

If we try to imagine Homeric verse in its original setting, the best evidence is probably Odysseus’ account of his adventures (Odyssey 9-12). It is a performance on roughly the same scale as a Shakespeare play or Verdi opera, with  an intermission and a spell-bound audience. That is very different from reading the Venetus A manuscript of the Iliad or the print edition of Venetus A, where the text is surrounded and sometimes  drowned by the marginal scholia. It is different again from reading the Iliad in a Teubner, Budé, or OCT version, where the consistency of format and typography across many authors of a canon facilitates access but also levels difference. You can and should abstract from the accidentals of presentation, and the more standardized the accidentals are the easier it is to perform that abstraction. But over time, the shared accidentals acquire a power of their own: if you spent a lifetime with the blue buckram Oxford Classical Text copies of Homer, Herodotus, and Plato you end up believing at some level that they were written that way, when in fact none of these authors could make head or tail of what they would see on any of those pages.

An interest in the conditions of reception led me to think about the role of surrogates. Our typical encounter with a text is through a surrogate — setting aside whether there is an original in the first place. Assuming that you own a ‘real’ Shakespeare folio and that it is closer to the ‘original’ you would still use a Bevington, Riverside, or Norton text most of time, and not only because you are afraid to damage it. Every surrogate has its own query potential, which for some purposes may exceed that of the original.

Thinking along those lines led me to the Chicago Homer. An oral style is shot through with echoes that the original audiences picked up over a lifetime of listening. The 19th century German school editions of Ameis-Hentze dutifully and comprehensively mark approximate or complete echoes, and with enough patience you can work your way through them. Could we use the Web to create a visual simulation of the network of repetitions and make modern readers ‘see’ what Homer’s audience heard? Starting from that question Craig Berry constructed a database of all repeated phrases, and Bill Parod wrote an interface that allowed you to filter and display repeated phrases while exploring the neural networks of bardic memory. You could also get very quick and accurate answers to questions that are very hard to ask in a print environment, such as “what repetitions are shared by the first and last books of the Iliad but occur nowhere else?”

My experience with the Chicago Homer shaped my view of what digital projects could or should do. In a worthwhile digital project you must do things that allow users to do things that are hard or impossible to do with the source object in its original form.

We are now in the midst of another and deeply consequential change in the allographic journey of texts. Between the late fifteenth and the mid sixteenth century an astonishing percentage of the European cultural heritage moved from circulating in manuscripts to circulating in print. There is a tipping point in changes of this kind. Once enough texts have migrated, the new medium comes to dominate circulation. What exists only in the old medium is increasingly ignored.

I had a striking demonstration of this last summer when I revised my 1984 book on the Iliad for a second edition scheduled to come out next year (2009). I did an online bibliographical search and then asked myself: “What do I miss if I restrict my reading of articles to items that are in Jstor and ignore stuff that exists only in print unless it is referred to repeatedly as importantly but not sufficiently well summarized in reviews or other discussion?” The answer to that is “not very much.” In many fields of the humanities, the allographic migration of journals to a digital medium has clearly gone beyond the tipping point.

With regard to the primary texts that are the focus of attention in the document-centric disciplines in the humanities, the latest phase in their allographic journey raises the question of ‘the query potential of the digital surrogate’. What can you do with the digital text that you cannot do with its printed source? What steps did the digitizers take to maximize its query potential in its new form? What new tools are available to take advantage of a properly digitized text?

A sermon on five texts

In talking about these questions, I’d like to take as my point of departure a handful of quotations that keep running through my mind. In ways that I don’t quite understand myself they set map out the field of my reflections. The first of them is a poem by Christian Morgenstern, an early twentieth century German poet famous for his nonsense poems, many of which bear witness to his philosophical and mystical leanings:

Die Brille The Spectacle
Korf liest gerne schnell und viel;
darum widert ihn das Spiel
all des zwölfmal unerbetnen
Ausgewalzten, Breitgetretnen.
Korf reads avidly and fast.
Therefore he detests the vast
bombast of the repetitious,
twelvefold needless, injudicious.
Meistens ist in sechs bis acht
Wörtern völlig abgemacht,
und in ebensoviel Sätzen
läßt sich Bandwurmweisheit schwätzen.
Most affairs are settled straight
just in seven words or eight;
in as many tapeworm phrases
one can prattle on like blazes.
Es erfindet drum sein Geist
etwas, was ihn dem entreißt:
Brillen, deren Energieen
ihm den Text – zusammenziehen!
Hence he lets his mind invent
a corrective instrument:
Spectacles whose focal strength
shortens texts of any length.
Beispielsweise dies Gedicht
läse, so bebrillt, man – nicht!
Dreiunddreißig seinesgleichen
gäben erst – Ein – – Fragezeichen!!
Thus, a poem such as this,
so beglassed one would just — miss.
Thirty-three of them will spark
nothing but a question mark.

 

second is a quotation by Father Busa, as posted by Willard McCarthy on the Humanist listserv:

“the use of computers in the humanities has as its principal aim the enhancement of the quality, depth and extension of research and not merely the lessening of human effort or time.”

The third is Ranganathan’s fourth law of library science:

Save the time of the reader.

The fourth is a quotation from Douglas Engelbart’s 1962 essay Augmenting Human Intellect, to which John Bradley drew my attention:

You’re probably waiting for something impressive. What I’m trying to prime you for, though, is the realization that the impressive new tricks all are based upon lots of changes in the little things you do. This computerized system is used over and over and over again to help me do little things–where my methods and ways of handling little things are changed until, lo, they’ve added up and suddenly I can do impressive new things. (p.83)

The final quotation comes from Laplace’s Essai philosophique sur les Probabilités:

On voit, par cet Essai, que la théorie des probabilités n’est, au fond, que le bon sens réduit au calcul; elle fait apprécier avec exactitude ce que les esprits justes sentent par une sorte d’instinct, sans qu’ils puissent souvent s’en rendre compte.

One sees in this essay that the theory of probability is at bottom nothing but common sense reduced to calculus” it makes you appreciate with exactitude what judicious mind have sensed through a kind of instinct without often being able to account for it.

Morgenstern and the prospects of a wide document space

Morgenstern’s Spectacles offer a nice way of focusing on the most distinctive query potential of digital text archives: their increasing size and their attendant promise to support analytical operations across far more text than you could possibly read in a lifetime. In the world of business, science, and espionage, the condensing power of digital spectacles is ceaselessly at work, extracting bits of knowledge from vast tailings of bad prose.

Google specializes in what you might want to call Morgenstern’s goggles. It lets you look for very small needles in very large haystacks. If there ar many needles and you are like the Prince of Arragon in the Merchant of Venice Google’s algorithms do a brilliant job of bringing you “what many men desire” and condensing on the first result page of millions of returns the hits that are most likely to be needed right now. There are many occasions in everyday life and scholarly contexts when this shallow but extraordinarily powerful search model works very well. But it is far from a complete model of the kinds of inquiry digital text archives are in principle capable of supporting.

The self-deprecating turn at the end of Morgenstern’s poem may be seen as a prophetic criticism of ‘knowledge extraction’. Why does the poem remain unreadable and in the aggregate yields just one question mark? Is that the fault of the poem, or do some things elude these spectacles or at least the ordinary uses of these spectacles?

Father Busa, Ranganathan and Douglas Engelbart

Let us come back to this question a little later and look at Father Busa’s observation from the perspective of Ranganathan and Douglas Engelbart. It ought to be true that making things easier or faster is not good enough. What is the point for a university to spend money on digital tools and resources if it does not produce better research by extending the scope of materials or methods and deepening the focus of inquiry?

But doubts arise if you look at this statement from the perspectives of Ranganathan’s fourth law of library science and Douglas Engelbart’s famous essay “Augmenting human intellect”. Ranganathan’s fourth law says simply “Save the time of the reader.” Much of what librarians do falls squarely under this heading. If books are neatly catalogued, the catalogue records are kept in filendrawers, the books are kept on clearly labeled shelves, and are reshelved promptly after use, readers can minimize the time cost of locating books and spend the saved time on reading them. In the progressive digitization of research libraries Ranganathan’s fourth law has found many new and powerful applications.

If you are ascetically inclined you might argue that scholarship is like the pinot noir grape and will produce its best results only under adverse conditions, whether in Burgundy or Oregon. There may be a downside to things being too easy, but much good research is hampered by things that take too long, are too expensive, or involve to much hassle. Thus the “mere” “lessening of human effort or time” certainly has the potential for enhancing the “quality, depth and extension of research.” Whether it will necessarily do so is of course another question.

Questioning the distinction between mere convenience and transformative changes is the major point of “Augmenting human intellect” by Douglas Engelbart, the inventor of the computer mouse. His insistence on the cumulative impact of “lots of changes in the little things you do”  is an example of Divide and Conquer at its best. Transformation is the result of incremental and typically minuscule change. When we try to evaluate the impact of digitization on scholarship in the humanities it is important to keep that truth in mind.

Rachel’s salamanders

It is not easy to measure that impact. Everybody uses computers in the ordinary course of university work. In just about all disciplines key generic research activities have gone digital. The bibliographical control of secondary literature and access to the journal literature has largely become an online business. This is of course part of research, but in a stricter sense “research computing” involves ways in which researchers use computers to manipulate their primary objects of attention. More accurately, computers never manipulate objects directly. They manipulate bits of the world translated into the ‘bits’ of binary digits. The key concept here is the “query potential of the digital surrogate.” Disciplines differ remarkably in their use of and dependence on digital surrogates. Some aspects of the actual or potential use of such surrogates in the humanities are well illustrated by a look at evolutionary biology, a discipline that has many structural and historical affinities with philology.

I know a little about the Museum of Vertebrate Zoology at Berkeley because my daughter worked there for a while. You can walk along shelves and shelves of salamander specimens, meticulously prepared and labeled by generations of field biologists going back to the 1800′s. These are, if you will, surrogates of living animals, and the labels‐‐metadata in current jargon‐‐are a minimal representation of their environment. Working with such specimens, with or without a microscope, is not unlike working with books.

There are projects to digitize such collections by creating digital representations ofthe specimens and by harmonizing the metadata across collections so that thespecimens at Berkeley and in the Field Museum exist in a single “document space,” searchable by scientists anywhere anytime.

Such a document space is a new kind of digital surrogate that makes many inquiries more convenient. Whether it enables “new” inquiries is open to question. You could after all think of yourself as imitating Jack Nicholson in Prizzi’s Honor, shuttling by plane between Chicago and Berkeley and rewarding yourself with dinners at Chez Panisse or Charlie Trotter’s on alternate nights. On the other hand, for a graduate student on a modest budget somewhere in Transylvania the query potential opened up by this digital surrogate may be the gateway to a successful career.

As part of her work, my daughter extracted DNA from some of these specimens, fed the DNA sequences into a collaborative gene bank, and used a comparative analysis of sequence to formulate new hypotheses about the descent of certain kinds of salamander families. As a generic research problem, this is a very familiar story to any humanities scholar who has ever traced the affiliations of different representations of the “same” object, whether a text, a score, or an image. But this particular representation of salamanders, and the subsequent manipulation of that representation are impossible without digital technology. You either do it with a computer or you do not do it all.

Over the course of my daughter’s career as a graduate student the time cost of analyzing DNA sequences on a computer dropped from weeks to hours. Ten years earlier, her work would for all practical purposes have been impossible. If you take away the computer you cripple projects like Rachel’s.  Such projects certainly  meet  Father Busa’s requirement that the computer affect the “depth and extension of research.”

It is not clear how much resarch in the humanities either meets the definition of research computing in this stringent sense or depends on the digital manipulation of its primary objects. Thomas Carlyle rewrote his History of the French Revolution after a servant accidentally threw the manuscript into the fire. A modern historian might write a book about the same topic on a word processor, with the chapters carefully backed up on a remote server , the bibliography assembled conveniently with Endnote, the copyediting performed on the author’s digital files, and the book produced from them. Digital technology would not be of the “essence” of this project, however, if one could envisage the scholar producing a very similar book during an extended stay in Paris, taking notes by hand in the Bibliothèque Nationale, composing on a manual typewriter in some small apartment within walking distance, and enjoying the culinary and other pleasures of France during off‐hours. This scenario retains its charm.

One could or course argue that the use of the computer in this hypothetical example illustrates the cumulative power of Engelbart’s “little things.” On the other hand, how different would this research scenario be if our scholar’s digital tool were a mid-eighties portable Compaq with 256 K floppy disks and WordPerfect 4.2? Whichever you look at such projects, they stay within an emulatory paradigm.  There is nothing wrong with this, as long as you recognize that it is not the whole story.

The emulatory paradigm

In  the humanities, and more particularly in Literary Studies,  the use of digital technology remains for the most part tethered to an ‘emulatory model’. Humanist scholars typically encounter the objects of their attention in relatively unmediated forms. The texts, scores, or images they read or look at are rarely the originals, but the surrogates pretend to be close to the originals, and the scholar behaves as if they were. In this regard they differ from scientists or, for that matter, alchemists, who for centuries have dissolved and recombined the targets of their attention for the sake of truth, gold, or both.

Emulation — the same, only better — is a very explicit goal of digital technology in many of its domains. Word processors and digital cameras are obvious examples. The computer screen as a device for reading is an obvious example of failure, at least so far. Much of this failure is due to the low quality of screen displays. Some of it has to do with the deeply engrained readerly habit of seeing the written word in a rectangular that is higher than it is wide. I was very struck by this when I got a monitor that could swivel to portrait mode and emulate the basic layout of a page. All of a sudden it was much easier to read JStor essays on the screen. For reading purposes, the horizontal orientation of the typical computer screen may be the greatest obstacle to emulation. [The last paragraph was written before the Kindel, the iPad, and its many tablet successors.  When it comes to many ordinary forms of reading, these devices have been game changers.)

If you think of the prospects of textually based scholarship in the digital world, the emulatory paradigm is both a blessing and a curse. On the side of blessings, consider Early English Books Online (EEBO). Scholars who work with texts rarely work with the original. They typically work with a surrogate, whether an edition or a facsimile. Sometimes you use the surrogate because it is the only thing you’ve got. More often you use it because it is in various ways more convenient. I own a copy of the Norton facsimile edition of the Shakespeare Folio, which is a closer surrogate of the original than Bevington, Norton, or Riverside, however you define that elusive term, but I rarely use it, and I daresay I am not unusual in that regard.

Microfilm is the least loved of all surrogates, and it may well be the only technology that will be superseded without a retro comeback. For half a century scholars of Early Modern England have had access to microfilm surrogates of their source documents, and they did not “have to” go to the Bodleian or British Library to look at this or that. If you are lucky enough to be associated with a university that has subscribed to EEBO, you now have a much more convenient surrogate at your finger tips. EEBO gives you a surrogate of a surrogate: what you see inside your web browser is a digital image of the microfilm image of the source text. But this surrogate at two removes is much superior in many ways. If somebody has paid the subscription fee for you you can get at it anytime from anywhere, and the fact that the cataloguing ‘metadata’ are also online, makes it a lot easier to find what you need.

Some months ago I asked colleagues what difference digital technology made to their research. I particularly remember a colleague who immediately responded by saying “EEBO has changed everything.” EEBO illustrates the beneficial effects of increasing the velocity of scholarly information. What Father Busa calls the “[mere] lessening of human effort or time” can and often does enhance the “quality, depth and extension of research”.

The curse of emulation

Now to the curse of emulation. If you wanted to take advantage of a word processor a quarter century ago you could not do so without learning something about the ways in which computers process texts, and you needed to familiarize yourself with the ‘command line language’ of some operating system and program. You could not easily learn this in a day, but it was not rocket science, and over the course of a few weeks you you could become quite competent in manipulating textual data in digital form. And you would constantly be reminded of the difference between a printout, a screen image, and the digital facts that underlie both.

There were strong incentives for learning how to process words in such an environment: if you knew how to do it revision became a much simpler task. Moreover, a word processor would automatically renumber your footnotes. I vividly remember conversations in the eighties — the heyday of deconstruction — with job candidates from Yale. They all learned enough to babysit the processing and printing of their dissertations on the university’s mainframe computer because the associated costs and anxieties were far outweighed by the benefits of having your footnotes automatically renumbered. Occasionally, however, footnotes would show up in odd places on the page, and we would joke about ‘ghosts in the margin.’

You don’t know about these things anymore if you use Microsoft Word, and by and large today’s graduate students know a lot less about text processing than the students who used WordPerfect 4.2 or wrote their dissertations on a mainframe. Good riddance in one way, but a loss in another. The skills you acquired and maintained to do word processing in the old-fashioned way were an excellent platform for text analysis. What kinds of useful analytical operations can you perform on a properly structured digital text that are difficult or impossible to do with its print source or with a digital version that is limited to emulatory use? That is the question about the query potential of the digital surrogate. It is a question the implications of which are harder to see intuitively for today’s graduate students than for their precursors of the eighties who by necessity picked up more knowledge about how a computer goes about its business when it processes text.

The prospects of a wide document space

A few years ago I directed a Mellon sponsored project called WordHoard, which we called “an application for the close reading and scholarly analysis of deeply tagged texts.” It is fundamentally a concordance tool and supports the age-old philological task of “going from the word here to the words there.” It contains morphosyntactically tagged texts of Chaucer, Spenser, and Shakespeare. When I wrote to a number of colleagues about this tool, I received a replay from Harold Bloom that read

Dear Mr. Mueller:
I am a throwback and rely entirely on memory in all my teaching and writing.

Harold Bloom

This amused me because I had been telling my students that WordHoard was a useful tool unless you were Harold Bloom and knew Shakespeare by heart. It is probably the case that Harold Bloom remembers more poetry than many scholars have ever read. He might have said the same thing to the monk who showed him his prototype of a concordance. If you have it “by heart” you are the con-cordance. But most of us are grateful to the likes of Douglas Engelbart for the mechanical “tricks” that are “used over and over again to help [us] do little things.” Not even Harold Bloom can read or remember everything, and for most of us the limits of human memory manifest themselves much earlier.

However closely we read individual texts, interpretation is always a form of contextualizing or of putting a particular detail within a wider frame that gives or receives meaning from an act of ‘focalization’. This is a fundamental and recursive procedure that operates from the lowest level of a single sentence through the parts of a work to the level of author, genre, and period.  I recently taught a course on Verdi and Dickens as the melodramatic masters of the 19th century. Traviata and Bleak House, both of them published in 1853, were the central works that highlight the deep shadows haunting 19th century progress. I drew a link between these works and the sudden English interest in Schopenhauer’s work, which was stimulated by John Oxenford’s essay of the same year in the Westminster Review.

This is a very conventional way of looking at 19th century literature, but it shows the progressive contextualization that human readers are very good at. They form large pictures by connecting relatively few dots in striking ways. This virtue is born from necessity. Whether or not the human brain operates like a computer, it is much slower and can perform at most 200 ‘cycles’ per second. To students of artificial intelligence it is a miracle how a person can look across the street and in less than a second can spot a familiar face in a crowd. There appears to be no way in which a computer can perform such an operation in 200 or for that matter 200,000 sequential steps. This enormous capacity of human intelligence to draw useful conclusions from very limited clues is also its greatest weakness. I like to say that paranoia, or the compulsion to connect all dots, is the professional disease of intelligence.

Large digital text archives offer the promise of complementary forms of contextualization. Human readers of the opening words of Emma (“Emma Woodhouse, handsome, clever, and rich”) immediately recognize them as an instance of the “three-adjective rule” that is common in Early Modern prose. Some readers might be curious whether a systematic look at many instances would reveal intersting semantic or prosodic patterns, whether some writers use this trick a lot, and what it might tell you about them. Gathering examples by hand is a very tedious process, and you would noit be likely to undertake it unless you had a strong hunch, backed up by salient detail, that something interesting is going on.

But now imagine that you have access to a very large linguistically annotated literary archive of the kind that corpus linguists have used for half a century. In a ‘linguistic corpus’ texts are not meant to be read by humans but processed by a machine. The texts have certain rudiments of readerly knowledge added to them. Every word declares that it is a plural noun, verb in the past tense — the very things that upset Shakespeare’s rebel peasant Jack Cade in his indictment of Lord Say:

It will be proved to thy face that thou hast men about thee that usually talk of a noun and a verb and such abominable words as no Christian ear can endure to hear

Literary scholars are likely to vary this into something like “talk of a noun and a verb and such tedious words as no literary ear can endure to hear.” Why make explicit what the critic’s innate ‘esprit juste’  perceives  ’par une sorte d’instinct’, to use Laplace’s terms?  The point, of course, is that the machine has no such instinct, but if properly instructed (which is itself a largely automatic process), it can within minutes range across a corpus of 100 million words or more and retrieve all sequences that follow the pattern ‘adjective, adjective, conjunction, adjective.’ If the archive was properly encoded and the access tool is sufficiently flexible, the outcome of such a search might be a list of all the sentences containing this pattern. If there are thousands of them, as there may well be, you may be able to group the sentences by time or the sequence of adjectives. An hour or two spent with such a list may be enough to tell you whether an interesting story is hiding in that list.

I once did this with 19th century fiction and have to confess that the results were rather nugatory. I did, however, discover that Charlotte Bronte was inordinately fond of this pattern. And not much time was spent or lost on this particular wild goose chase.

I have come to use the acronym DATA for such inquiries, which, to quote an IBM executive, put “dumb but fast” machines in the service of “smart but slow” humans.  DATA stands for ‘digitally assisted text analysis’, and the operative word here is ‘assisted’. Like ‘complementary contextualization’, the acronym DATA makes no claim that digitally assisted text analysis marks the end of reading or moves the business of interpretation to another realm. Let me conclude by turning to the  preface to a collection of his essays by the great German classicist Karl Reinhardt. Reinhardt throughout his career was torn between a deep allegiance to the ‘positivistic’ Altertumswissenschaft of Wilamowitz and an equally deep allegiance to the lapsed classicist Nietzsche (‘what use is the authentic text if I don’t understand it?’). The contested word in that continuing dilemma was ‘philology’.

Die Philologie wird sich selbst umso fraglicher, je weniger sie von sich lassen kan. Das heisst nicht, dass sie innerhalb ihres Bereichs an Zuversichtlichkeit verlöre. Heisst auch nicht, dass sie vor der Erweiterung ihrer Grenzen, die die geisteswissenschaftliche Entwicklung ihr gebracht hat, sich verschlösse. Aber es gehört zum philologischen Bewusstsein, mit Erscheinungen zu tun zu haben, die es transzendieren. Wie kann man versuchen wollen, philologisch interpretatorisch an das Herz eines Gedichts zu dringen? Und doch kann man philologisch interpretatorisch sich vor Herzensirrtümern bewahren. Es geht hier um eine andere as die seinerzeit von Gottfried Hermann eingeschärfte ars nesciendi.  Da ging es um Dinge, die durch Zufall, durch die Umstände der Überlieferung, durch die Unzulänglichkeit des Scharfsinns sich nicht  wissen liessen. Hier geht es um etwas notwendigerweise Unerreichliches, dessen Bewusstwerden doch rückwirkt auf das, was es zu erreichen gilt. Es handelt sich nicht um das letzt allgemeine Ignoramus, sondern um jene methodische Bescheidung, die sich klar is, immer etwas ungesagt lassen zu müssen, auch mit allem Merk- und Einfühlungsvermögen an das Eigentlich nicht herandringen zu können, nicht zu dürfen. (Tradition und Geist. Göttingen, 1960, p. 427)

Philology becomes more questionable to itself, the more it does what  it cannot stop doing. This does not mean that it loses its confidence within its own domain. Nor does it mean that it excludes itself from  the expansion of its borders that developments in the humanities have  opened to it.  But it is part of philological awareness that one  deals with with phenomena that transcend it. How can one even try to  approach the heart of a poem with philological interpretation? And  yet, philological interpretation can protect you from errors of the  heart. This is not a matter of the ars nesciendi that Gottfried Hermann insisted on in his day. There it was a matter of things you could not know because of accidents, the circumstances of transmissions, or of inadequate acuity. Here it is a matter of something that is necessarily beyond reach but our awareness of it effects our way of reaching towards it. It is not a matter of the ultimate Ignoramus but of a methodology modesty that is aware of something that must be left unsaid and that with all perceptiveness or intuition you cannot and should not trespass on.

 

Substitute ‘e-philology’,  ’Natural Language Processing’, or ‘quantitative text analysis’ for ‘philology’ and you have a wonderfully pertinent statement of what digitally assisted text analysis can and cannot do in the domain of  Literary Studies.  Notice that Reinhardt’s remarks about the expansion of philological borders comfortably include computers. Indeed there is irony in the fact that his contemporaries would have been enthusiastic adopters of the digital tools that the curent literary academy does not quite know what to do with. They are wonderful tools, as long as you follow Hamlet’s advice to the players and use them with “modesty and cunning.”

Ben Pauley, Building New Tools for Digital Bibliography @ NUDHL, Fri, 1/11/13, 12-2pm, AKiH

 “Building New Tools for Digital Bibliography: Constructing a Defoe Attributions Database for the Defoe Society”

Dr. Ben Pauley, Associate Professor, Eastern Connecticut State University

Friday, January 11, from 12 to 2 pm in the Alice Kaplan Humanities Institute seminar room, Kresge 2-360.

Lunch served!!

And don’t miss…

Unlocking the English Short Title Catalogue: New Tools for Early Modern and Eighteenth-Century Bibliography and Book History

A Digital Humanities Presentation to Students and Faculty by Ben Pauley, Associate Professor, Eastern Connecticut State University, NU Library Forum Room,
Thursday, January 10, 2013, 3:30 – 5:00 – Refreshments will be served.

The English Short Title Catalogue (ESTC) is the most comprehensive guide in existence to the output to published books in the English-speaking world during the era of handpress printing. With nearly 500,000 bibliographic records and information on more than three million library holdings, it is both the best census that we have of early British and American print and the best available guide to locating extant copies of those items.

Begun in the late 1970s, the ESTC was conceived from the first as an electronic resource, one that would leverage new developments in library technology to facilitate collaboration among scholars and librarians worldwide and one—crucially—that could be continuously revised and refined. In recent years, however, it has become clear that the ESTC is in need of fundamental transformation if it is to keep pace with a scholarly landscape that is being transformed by digitization.

Professor Pauley’s talk will highlight the challenges and opportunities facing the ESTC in its fourth decade, and will present the recommendations of a Mellon-funded planning committee for redesigning the ESTC as a 21st-century research tool. As envisioned, the new ESTC will stand at the intersection of librarianship, bibliography, and the digital Humanities, facilitating new kinds of enquiry in fields such as literary and cultural history, bibliography, and the history of the book.

This event is sponsored by Northwestern University Library’s Center for Scholarly Communication and Digital CurationNUL Special Libraries, and WCAS Department of English

œ

Professor Ben Pauley (Ph.D. Northwestern, 2004) specializes in eighteenth-century literature, with an emphasis on the works of Daniel Defoe. In addition to publishing essays and presenting papers in eighteenth-century literary studies, he has been involved in several digital projects, particularly concerning bibliography. He is the editor and administrator of Eighteenth-Century Book Tracker (www.easternct.edu/~pauleyb/c18booktracker), an index of freely-available facsimiles of eighteenth-century editions. He was co-principal investigator, with Brian Geiger (Director, Center for Bibliographical Studies and Research, University of California-Riverside), of “Early Modern Books Metadata in Google Books,” a recipient of a Google Digital Humanities Research Award for 2010–11 and 2011-12. He is a member of the board of the Defoe Society, serves on the technical review board for 18thConnect, and is an advisor to the recently-launched 18th-Century Common, a public Humanities portal for research in eighteenth-century studies.

 

A Gentle Introduction to Digital Text Analysis, 11/14

Subject: A Gentle Introduction to Digital Text Analysis – SRTS event Nov 14th

 

Please join us for the last Scholarly Resources & Technology Series event of the fall quarter:

 

A Gentle Introduction to Digital Text Analysis

 

Date: Wednesday, Nov 14th

Time:  5:00pm to 6:00pm

 

Using computers to analyze and visualize literary texts is a practice with a long history in the digital humanities. This presentation outlines that history and also explores a few of the latest digital tools enabling scholars to use computational methods to analyze individual texts and corpora. The presentation will use Jade Werner’s work on the revision history of Lady Morgan’s Luxima,The Prophetess (1859). No programming experience required.

 

Presenters: 

Jade Werner, Doctoral Student, English Department

Josh Honn, Digital Scholarship Fellow, Center for Scholarly Communication & Digital Curation

 

Registration not required. 

 

—————-

Also a reminder for this Friday’s event at noon in the Library Forum Room

 

Professor Owen encourages you to bring your iPad to follow along.

 

 

Sincerely,

Scholarly Resources & Technology Series team