Ngram-ing, Big Data, Literature, & Culture

Fellow NUDHL-ers —

Happy summer! Hope you’re all finding a bit of time to decompress after a hectic academic year. I think I’m a bit late to the party on this one, but have just started really looking at Google’s Ngram Viewer. I recently happened upon this blog post by UC Berkeley’s Prof. Claude Fischer, and thought I’d throw it on here for those of you who don’t track the #NUDHL Twitter tag.

I’m really interested in some of the challenges confronting this kind of analysis, which Fischer mentions at the end of the post. For example: Which books are included for analysis, and how representative are they of broader cultural and social belief systems, linguistic patterns, etc.? How can this kind of tool account for the ways the meanings attached to particular words and phrases change over time? (I also wonder: how can this kind of analysis account for the fact that words and meanings are continually being discursively contested?)

Anyway, thought it was an interesting summary that some of you might like to read. Happy June!

 

WGBH Media Library and Archives

From: Allison Pekel <allison_pekel@wgbh.org>
Date: Wed, May 1, 2013 at 12:04 PM

I am working with a project that I thought might be of interest to the
American History Community.

I work for WGBH, Boston in the Media Library and Archive and the Archive
has been funded by the Mellon Foundation to work with academic scholars who
have interest in utilizing our moving image and sound materials through the
course of their research. We hope to increase public awareness of the vast
collections that digital repositories hold by publishing our entire
archival catalogue online, for open access and use.

Placing the catalogue online however is only the first step, as records may
be incomplete or misleading. To help enhance the quality of our records, we
are inviting scholars, teachers and students to research our catalogue and
contribute their own discoveries and findings back to us. There are even
limited opportunities there to catalogue and curate an online collection
specific to your field of research as part of Open Vault (
http://openvault.wgbh.org<http://openvault.wgbh.org/>). Final products
could include essays on your topic, streaming public access to one
selection of media in your collection, supplying metadata for the items in
your collection and/or presenting your findings at a conference.

As a producer of Frontline and Boston Local News, we have quite a few
materials in the American History genre, so if you have an ongoing research
project and would consider utilizing moving image and sound materials in
your work, please don’t hesitate to contact me.

Allison Pekel
WGBH Media Library and Archives
Allison_Pekel@WGBH.org

Opportunity: Jump-start your Python, R and Gephi skills, Nijmegen, Radboud University

Dr Mike Kestemont, University of Antwerp; Dr Marten Düring, Radboud
University Nijmegen
03.04.2013-05.04.2013, Nijmegen, Radboud University
Deadline: 15.03.2013

Jump-start your Python, R and Gephi skills

This intensive three-day workshop will equip both junior and senior
scholars with the ability and skills to “go digital”. The goal of this
workshop is to offer its participants the skills to understand the
potential of selected tools in Digital Humanities (DH), to consider
their application within the realms of their own field and, eventually,
to be able to start their own eHumanities projects. The workshop will
consist of three modules: Programming in Python, Statistics in R and
Network Analysis with Gephi. These modules will be designed to build
upon each other, thereby putting newly acquired skills to practical use
immediately. We also want to ensure a productive exchange between
participants as well as the instructors and, as such, the development of
long-lasting networks. In keeping with ALLC’s principal interests, the
workshop has a firm emphasis on the computational analysis of textual
data, be they literary or linguistic.

To ensure the broad coverage of relevant techniques for the workshop, we
have selected three generic research tools which are currently widely
applied within the eHumanities.
The programming language Python is widely used within many scientific
domains nowadays and the language is readily accessible to scholars from
the Humanities. Python is an excellent choice for dealing with
(linguistic as well as literary) textual data, which is so typical of
the Humanities. Workshop participants will be thoroughly introduced to
the language and be taught to program basic algorithmic procedures.
Because of the workshop’s emphasis on textual data, special attention
will be paid to linguistic applications of Python, e.g. Pattern.
Finally, participants will be familiarized with key skills in
independent troubleshooting.

Deplored by many DH scholars, most humanities curricula today fail to
offer a decent training in statistics. At the same time, a majority of
DH applications make use of quantitative tools in one way or the other.
We seek to provide our participants with hands-on experience with a
common statistical tool, R, with a specific emphasis on the practical
implementation of statistics and potential pitfalls. The statistical
software package R is widely used in the scientific processing and
visualisation of textual data.

Network visualizations can be counted among the most prominent and
influential forms of data visualization today. However, the processes of
data modelling, its visualization and the interpretation of the results
often remain a “black box”. The module on Gephi will introduce the key
steps in the systematization of relational data, its collection from
non-standardized records such as historical sources or works of fiction,
the potential and perils of network visualizations and computation and
finally the identification of relevant patterns and their significance
for the overall research question.

The workshop seeks to provide as much practical skills and knowledge in
as little time as possible. Each module will have the same basic
structure: After an introduction to the respective method and the
targets for the day, the participants will solve pre-defined tasks. The
workshop embraces the concept of trial and error and learning based on
one’s own accomplishments rather than passive information reception.

Registration

Participants are expected to pay a fee of EUR 60 and to make
arrangements for their travel and accommodation. Thanks to the EADH (ex
ALLC) funding we have received we are able to offer free lunch on all
three days as well as a farewell dinner.

In addition, we can offer 2 bursaries for students/participants who have
no other source of funding.

In order to register, please email Mike Kestemont at
mike.kestemont@gmail.com or Marten Düring at md@martenduering.com by
March 15th. Applicants are asked to include a short CV, a statement of
their previous experience with the above mentioned tools and their
research goals.

Previous experience in either programming, statistics or data
visualization is not required.

For further information of eHumanities research at Radboud University
Nijmegen and on the workshop, please visit http://www.ru.nl/ehumanities

Generously funded by the ALLC – The European Association for Digital
Humanities and with support from Radboud University Nijmegen

————————————————————————
Programme

We are very happy to have brought together a team of instructors who are
both experts in their field and great teachers:

Day 1: Programming in Python and basic Natural Language Processing tools
(Instructors: Folgert Karsdorp, Meertens Institut Amsterdam and Maarten
van Gompel, Radboud Universiteit Nijmegen)

Day 2: Basic statistics in R (Instructor: Peter Hendrix, University of
Tübingen)

Day 3: Data modelling and network visualizations in Gephi (Instructor:
Clément Levallois, Erasmus University Rotterdam)

Homepage <http://www.ru.nl/ehumanities>

URL zur Zitation dieses Beitrages
<http://hsozkult.geschichte.hu-berlin.de/termine/id=21210>

————————————————————————
H-Soz-u-Kult übernimmt keine Gewähr für die Richtigkeit, Vollständigkeit
oder Aktualität der von unseren Nutzern beigetragenen Inhalte. Bitte
beachten Sie unsere AGB:
<http://www.clio-online.de/agb>.

_________________________________________________
HUMANITIES – SOZIAL- UND KULTURGESCHICHTE
H-SOZ-U-KULT@H-NET.MSU.EDU
Redaktion:
E-Mail: hsk.redaktion@geschichte.hu-berlin.de
WWW:    http://hsozkult.geschichte.hu-berlin.de
_________________________________________________

Ben Pauley, Building New Tools for Digital Bibliography @ NUDHL, Fri, 1/11/13, 12-2pm, AKiH

 “Building New Tools for Digital Bibliography: Constructing a Defoe Attributions Database for the Defoe Society”

Dr. Ben Pauley, Associate Professor, Eastern Connecticut State University

Friday, January 11, from 12 to 2 pm in the Alice Kaplan Humanities Institute seminar room, Kresge 2-360.

Lunch served!!

And don’t miss…

Unlocking the English Short Title Catalogue: New Tools for Early Modern and Eighteenth-Century Bibliography and Book History

A Digital Humanities Presentation to Students and Faculty by Ben Pauley, Associate Professor, Eastern Connecticut State University, NU Library Forum Room,
Thursday, January 10, 2013, 3:30 – 5:00 – Refreshments will be served.

The English Short Title Catalogue (ESTC) is the most comprehensive guide in existence to the output to published books in the English-speaking world during the era of handpress printing. With nearly 500,000 bibliographic records and information on more than three million library holdings, it is both the best census that we have of early British and American print and the best available guide to locating extant copies of those items.

Begun in the late 1970s, the ESTC was conceived from the first as an electronic resource, one that would leverage new developments in library technology to facilitate collaboration among scholars and librarians worldwide and one—crucially—that could be continuously revised and refined. In recent years, however, it has become clear that the ESTC is in need of fundamental transformation if it is to keep pace with a scholarly landscape that is being transformed by digitization.

Professor Pauley’s talk will highlight the challenges and opportunities facing the ESTC in its fourth decade, and will present the recommendations of a Mellon-funded planning committee for redesigning the ESTC as a 21st-century research tool. As envisioned, the new ESTC will stand at the intersection of librarianship, bibliography, and the digital Humanities, facilitating new kinds of enquiry in fields such as literary and cultural history, bibliography, and the history of the book.

This event is sponsored by Northwestern University Library’s Center for Scholarly Communication and Digital CurationNUL Special Libraries, and WCAS Department of English

œ

Professor Ben Pauley (Ph.D. Northwestern, 2004) specializes in eighteenth-century literature, with an emphasis on the works of Daniel Defoe. In addition to publishing essays and presenting papers in eighteenth-century literary studies, he has been involved in several digital projects, particularly concerning bibliography. He is the editor and administrator of Eighteenth-Century Book Tracker (www.easternct.edu/~pauleyb/c18booktracker), an index of freely-available facsimiles of eighteenth-century editions. He was co-principal investigator, with Brian Geiger (Director, Center for Bibliographical Studies and Research, University of California-Riverside), of “Early Modern Books Metadata in Google Books,” a recipient of a Google Digital Humanities Research Award for 2010–11 and 2011-12. He is a member of the board of the Defoe Society, serves on the technical review board for 18thConnect, and is an advisor to the recently-launched 18th-Century Common, a public Humanities portal for research in eighteenth-century studies.

 

Notecards & Cowboy Hats

How do we not only take notes, but also take note of the ways that the digital transforms the research process?

At the tail-end of our first meeting, Justin Joyce brought up the question of how he might apply the digital to his collection of notecards that attempt to codify whether the good and bad guys indeed wore white or black hats in classic Western films.

At first, we pondered how computational power might not be very adept at addressing the difficult questions of judging the good guys from the bad (not that people are that skilled at this task all the time either!). This is, of course, one of the key questions about thinking through algorithmic analysis. But then we began to talk about more than just how the digital is not some kind of positivistic fantasy of attaining definitive analysis. We also broached the question of wether new modes of presenting research in digital form might provide fresh possibilities for the ways that argument look, feel, and what they ultimately mean. Could Justin do something interesting merely by scanning his original notecards and presenting his findings in the digital medium in ways that might produce new perspectives on his research question?

This part of our conversation came back to mind for me when I recently browsed a few blog posts by Rachel Leow (thanks to Josh Honn for the link), Matthew Kirshenbaum, Sasha Hoffman, and Thomas Riley. These posts all relate efforts to use both analog and digital modes of note taking in their research. Tools used: DevonThink, Scrivener, Zotero, Evernote, among others. I share these musings with graduate students, librarians, and tech folks among us as potentially useful explorations of what we might call “the question of the digital note.” It strikes me that this is not only a practical issue of managing research, but also a question of how the structure of the research process in the digital medium might inspire new ideas, approaches, questions—in short the research process, transferred into the digital in more consciously developed ways, might lead to new kinds of findings.