About – Prosody & Speech Dynamics Lab

What is prosody?

A sentence can be said in many different ways—through the selective emphasis or reduction of individual words, through the placement of pauses and other marks of disjuncture between words, and through pitch melodies that overlay words and phrases. These patterns, together with patterns having to do with voice quality, comprise the prosody of a spoken utterance. Languages and dialects may differ in their prosodic patterns, giving rise to noticeable differences in rhythmic timing, pitch melodies, or other sound properties that contribute to the lay notion of “accent”. Languages and dialects may also differ in the linguistic function of prosody—the role prosodic patterns plays in demarcating phrases and discourse segments, in signaling the information status of a word or phrase in relation to the evolving discourse context, or in marking the illocutionary force of an utterance, for example, as a statement or question. Linguistic accounts of prosody typically locate the basis of rhythm, pitch melodies and other prosodic effects in phonological structures that designate edge positions and positions of prominence in domains the size of words, phrases and larger units. Beyond their status in the grammatical system of a language, the prosodic dimensions of speech—in rhythmic timing, pitch, voice quality, and loudness—are also shaped by non-linguistic factors, including speaker’s affect and emotional state, the communication setting, and the social identities of the speaker and addressee.

What are speech dynamics?

Speech happens over intervals of time, on scales from the millisecond timing of individual consonants and vowels, to the seconds-long intervals of conversational turns, and even longer segments of prolonged communication. Speech dynamics are the systematic patterns in the sequencing of speech sounds and in their manifestation in the acoustic speech signal, which can be observed at any one or more time scales. These dynamics include patterns in speech production as well as speech perception. Speech dynamics are dependent in part on the prosodic structuring of speech, and indeed, patterns of change in prosodic dimensions such as pitch and rhythmic timing provide perceptually salient cues to the organization of words into structural units that index syntactic and semantic properties of sentences. Speech dynamics are also observed in phenomena of entrainment or accommodation, in which the speech behaviors of one speaker depend on those of their interlocutor. Yet another layer of dynamics is found in the relationship between speech patterns, and especially prosodic patterns, and gestures of the head, face, hands and body that accompany speech.

Research methods

The Prosody & Speech Dynamics Lab investigates prosody and speech dynamics in American English and many other languages and dialects, through the quantitative analysis of acoustic and behavioral data, and also linguistic data from symbolic phonological, syntactic, semantic and pragmatic representations. We collect data using observational and experimental methods. Observational methods involve the construction and analysis of audio or audio-visual speech databases, with speech produced as narrative, in task-oriented dialogues, or in free conversation. Experimental methods targeting speech production involve speech elicitation tasks conducted in a laboratory setting, using experimenter-designed text or audio prompts. Experimental methods targeting speech perception or comprehension involve identification, annotation, or discrimination tasks conducted in a laboratory, or over the internet. Our internet research involves participants selected through targeted recruitment, or through crowd-sourcing platforms such as Amazon Mechanical Turk.

A central focus of our research is modeling the patterned variation in speech, especially variation in pitch, timing, or other dimensions of sound structure that relate to prosodic context. We use statistical methods to model patterned variation in data from speech production, perception, and comprehension, with the goal of capturing the linguistic factors that influence variation, and parameters by which individual speakers and hearers differ in their behavior.

Our aim is to produce scientific results that are replicable and that generalize to other datasets and different analytic methods. The methods, tools, and databases we develop are available for other researchers, subject to requirements protecting the privacy of research participants.