(Online Talk) December 16, 2020: Xin Xie

Posted on December 10, 2020 · Leave a comment

Our meetings this quarter will be held on Zoom. Please sign up for the listserv to receive the Zoom link (instructions in sidebar). On December 16, Dr. Xin Xie will be joining us to present:

Navigating speech variability via distributional learning: what is there to learn?

One of the central unresolved questions in speech perception is how listeners overcome talker-to-talker variability in the meaning-to-sound mapping. In addition to low-level domain-general normalization processes, speech-specific normalization (e.g., McMurray & Jongman, 2011), storage (e.g., Goldinger, 1996; Johnson, 1997), and distributional learning (e.g., Clayards et al., 2008; Kleinschmidt & Jaeger, 2015) have been proposed as a mechanism to navigate this problem. However, these views have typically been investigated separately from one another, with different phonetic/phonological contrasts. As a result, existing evidence is often compatible with multiple accounts.

In this talk, I present a step towards a stronger test of these competing accounts, jointly against the same dataset. This approach combines: 1) production experiments to estimate within-/across-talker variability in the acoustic cue distributions, 2) computational modeling to quantify the expected amount of information listeners can gain from learning the talker-level versus group-level distributions, and 3) perception experiments to probe if distributional learning indeed predicts changes in listeners’ categorization judgments. To demonstrate, I focus on a study including a database of prosodic productions (65 talkers, ~3000 tokens), with which we directly tested whether learning of phonetic cue distributions (normalized or not) can in principle afford listeners the means to navigate the variability in prosodic perception.

(Online Talk) November 11, 2020: Melissa Baese-Berk

Posted on October 28, 2020 · Leave a comment

Dr. Melissa Baese-Berk will be presenting on a review paper of hers on the topic of perceptual learning of native and non-native speech. The title and abstract of the paper are below; the paper can be accessed here: https://doi.org/10.1016/bs.plm.2018.08.001.

Perceptual Learning for Native and Non-native Speech

In order to successfully perceive a language, a listener must be able to flexibly adapt to their input; however, a listener must also maintain the stability of their perceptual system, in order to maintain its integrity. Given the competing requirements of demonstrating flexibility while maintaining stability, it is critically important to understand under what circumstances listeners will demonstrate flexibility and under what circumstances they will not. In this chapter, I address this question using three types of perceptual learning for speech: perceptual flexibility of existing speech sound categories, acquisition of novel speech sound categories, and adaptation to unfamiliar accented speakers. These types of learning highlight the importance of balancing flexibility and stability in the perceptual system. In addition to highlighting previous results, their implications for language production and language learning are explored. Further, exploring these three types of learning together underscore the importance of investigating speakers and listeners from a variety of language backgrounds.

(Online Talk) October 28, 2020: Melissa Baese-Berk

Posted on October 21, 2020 · Leave a comment

Our meetings this quarter will be held on Zoom. Please sign up for the listserv to receive the Zoom link (instructions in sidebar).

Production Learning of Non-Native Speech Contrasts after Training in Perception or Production
Melissa Baese-Berk, Zoë Haupt, Zachary Jaggers, Arthur Samuel, Tillena Trebon, Maggie Wallace, and Allegra Wesson

Previous work has found that simultaneous training of a non-native sound distinction in both perception and production can disrupt rather than enhance perceptual learning. In spite of this disruption, subjects trained in both modalities have shown gains in learning to produce the distinction they were trained on, compared to perception-only training. The current study examines the learning of a new sound distinction in production for participants in a variety of training conditions. Spanish native speakers are trained on a novel contrast. Given that learners’ productions of the contrast may not be identical to the way native speakers distinguish it, we apply Linear Discriminant Analysis to acoustic measurements of subjects’ post-test productions to classify whether and how they do distinguish the categories in a potentially multidimensional space. This classification model is then applied across conditions to compare production learning across training modes and examine how production learning relates to perceptual learning.

(Online Talk) October 21, 2020: Ann Bradlow

Posted on October 13, 2020 · Leave a comment

Our meetings this quarter will be held on Zoom. Please sign up for the listserv to receive the Zoom link (instructions in sidebar).

Second-Language French, Spanish, and English All Exhibit Low Information Rate Relative to First-Language Speech

Listening to even highly intelligible foreign-accented speech can be slow and effortful. This research proposes that the extra effort required for L2 speech understanding is related to its suboptimal information transmission profile. Specifically, slow speech rate (i.e. few syllables/second) combines with low information density (i.e. more syllables for a given text/meaning) to yield very low information rate (i.e. less information conveyed/second) for L2 versus L1 speech. Based on L1 and L2 recordings of a standard text, we show that L2 English, L2 French and L2 Spanish all exhibit slower speech rates and lower information densities than their L1 counterparts. Lower information density for L2 speech results from substantial syllable reduction in L1 speech (all languages) in contrast to either no reduction (Spanish) or syllable epenthesis (English and French) in L2 speech. Thus, across languages, L2 speech involves slow and information sparse syllables leading to sub-optimal information transmission to the listener.

June 3: Joseph C.Y. Lau and Sandra R. Waxman

Posted on May 17, 2020 · Leave a comment

Joseph C.Y. Lau will be presenting joint work with Sandra R. Waxman from the Infant and Child Development Center at NU. Sign up for the listserv to receive Zoom details.

Which acoustic features do infants use to link language (and a few other signals) to cognition? A machine-learning approach

Language is central to human cognition. As a hallmark of our species, language is a powerful tool that permits us to beyond the here-and-now to establish mental representations and to communicate with others the content of our minds. Research in our group has tackled the question of how, and how early, infants establish a language-cognition link. Within our group, two decades of behavioral studies have documented that by 3 months of age, infants link language (and a few other signals) to cognition (measured by object categorization). Interestingly, this cognitive advantage is evident when infants listen to infant-directed speech IDS in their own native language (e.g. English to infants from an English-speaking environment) and in some (e.g. German), but not all (e.g. Cantonese), non-native languages. Decades of studies have shown that speech processing in early infancy is tuned by language environment. We have shown that this perceptual tuning has downstream conceptual consequences on what signals infants link to cognition. Moreover, this link between language and cognition is disrupted when language samples are perturbed (e.g. presented in reverse). Surprisingly, at 3-4 months, language is not the only signal that supports infant cognition: listening to vocalizations from non-human primates (e.g. blue-eyed Madagascar lemurs), but not birds (e.g. zebra finches), also support infant object categorization. But which acoustic features, singly or in combination, do infants use to link this small subset of signals to cognition? Addressing this question is crucial to understanding the underpinnings of the language-cognition link. The current proposed project tests the hypothesis that there are acoustic properties shared among our identified “privileged” signals (e.g. English IDS, German IDS and lemur calls), and that these are also instrumental in acoustic and speech processing in early infancy. The goal is to identify common acoustic features in a data-driven approach, using supervised machine-learning-based models that search in multiple acoustic domains to identify acoustic representations that maximally classify the natural classes of “privileged” and “non-privileged” signals respectively for linguistic (e.g. English and German IDS vs. Cantonese IDS) and non-linguistic vocalizations (e.g. Lemur calls vs. Zebra Finch Songs), or among all signals regardless of their linguistic vs. non-linguistic nature. Also by modeling the different classifications of “privileged” and “non-privileged” signals at 4-months vs. 6-months (e.g. Lemur calls are “privileged” at 4-months but “non-privileged” at 6-months), this project seeks not only to pinpoint which acoustic features undergird the striking behavioral findings, but also to model how developmental changes in salience of acoustic features may subserve behavioral changes from 3 to 7 months. If successful, this project will also shed light on the evolutionary and developmental antecedents to the language-cognition links. Modeling results will also allow us to evaluate the hypothesis that there exist separate but parallel pathways in which linguistic and non-linguistic signals facilitate infant cognition based on different combinations of acoustic parameters. How this study may illuminate the fundamental role of prenatal and postpartum neurophysiological sensory experience in establishing the uniquely human language-cognition link will be discussed.

Online talk May 13: Jeffrey Lamontagne (McGill)

Posted on May 10, 2020 · Leave a comment

Our meetings this quarter will be held on Zoom. Please sign up for the listserv to receive the Zoom link (instructions in sidebar).

Jeffrey Lamontagne

Finding Grammar Amidst Optionality and Opacity: High-vowel tenseness in Laurentian French
Laurentian French (also commonly called Canadian French, Quebec French or Québécois) is characterised by a complex combination of processes affecting the tense/lax quality of high vowels. Laxing in word-final syllables is completely predictable, but laxing in non-final syllables combines optionality and opacity through harmony (local and non-local), disharmony, retensing, and vowel deletion. While laxing processes have received considerable attention in the literature (e.g. Dumas, 1987; Poliquin, 2006; Fast, 2008; Bosworth 2011), all quantitative data currently available are from acceptability judgments that Poliquin collected, rather than from production. The lack of production data stems from tense and lax high vowels not being possible to classify using one or two acoustic dimensions (Arnaud et al., 2011; Sigouin, 2013).
In collaboration with Peter Milne, a forced aligner was trained on tense and lax high vowels in final syllables (where tenseness is fully predictable) to classify tokens in non-final syllables, thereby creating the first corpus annotated for high-vowel tenseness. Drawing on 24,000 words with high vowels in non-final syllables, I refute Poliquin’s (2006) proposal that learners have insufficient input to generate a grammar that includes the phonological processes affecting high-vowel tenseness. I demonstrate that the community-level grammar a learner is expected to acquire largely reflects the broad processes proposed in the literature, but that certain aspects of those processes differ those suggested in the literature (e.g. the directionality of local harmony). I finally argue that these processes are phonological in nature; they cannot be explained purely in terms of undershoot or (non-phonologised) coarticulation.

Online talk May 20: Melissa Baese-Berk (University of Oregon)

Posted on April 28, 2020 · Leave a comment

Our meetings this quarter will be held on Zoom. Please sign up for the listserv to receive the Zoom link (instructions in sidebar).

—

Perception of and adaptation to non-native and unfamiliar speech

Listening to unfamiliar speech, including non-native speech, often results in substantial challenges for listeners. The consequences of these challenges are far-reaching (i.e., costs for comprehension, memory and other down-stream processing), and increased costs for listening to unfamiliar speech exist even when the speech is fully intelligible (e.g., McLaughlin & Van Engen, 2020). I will present a series of studies aimed at investigating what makes perception of non-native speech especially challenging and what factors impact adaptation to this speech. I will show some new data that suggests that social factors, in addition to linguistic properties, can impact adaptation to unfamiliar speech.

Talk March 4: Matt Goldrick

Posted on March 1, 2020 · Leave a comment

Modeling Liaison using Gradient Symbolic Representations

(Joint work with Paul Smolensky and Eric Rosen, Johns Hopkins University & Microsoft Research)

The Gradient Symbolic Computation framework claims that the mental representations underlying speech are abstract, symbolic, and continuous, such that different symbolic constituents can present within a structure to varying degrees. I’ll discuss how this framework can be used to model the distribution of liaison consonants in French, proposing an algorithm that learns the relative activation of symbolic constituents.

Talk Feb 5: Ann Bradlow

Posted on February 1, 2020 · Leave a comment

Global language systems and phonetics

I will present two approaches to language typology and classification that I believe are relevant for our understanding of speech production and perception in the context of extensive multilingualism and language/dialect contact. Specifically, I will briefly outline (1) the distinction between “Esoteric Languages” and “Exoteric Languages” as discussed in Lupyan and Dale (2010, Language Structure Is Partly Determined by Social Structure, PLoS ONE 5(1): e8559), and (2) the “Global Language System” as developed in de Swaan (2002, Words of the World: The Global Language System). Together, these two views raise a number of issues and questions that are potentially instructive for the evolving field of experimental and corpus phonetics.

Talk Jan 15: Uriel Cohen Priva

Posted on January 8, 2020 · Leave a comment

Understanding lenition through its causal structure

Consonant lenition refers to a list of seemingly unrelated processes that are grouped together by their tendency to occur in similar environments (e.g. intervocalically) and under similar conditions (e.g. in faster speech). These processes typically include degemination, voicing, spirantization, approximantization, tapping, debuccalization, and deletion (Hock 1986). So, we might ask: What are the commonalities among all these processes and why do they happen? Different theories attribute lenition to assimilation (Smith 2008), effort-reduction (Kirchner 1998), phonetic undershoot (Bauer 2008), prosodic smoothing (Katz 2016), and low informativity (Cohen Priva 2017). We argue that it is worthwhile to focus on variable lenition (pre-phonologized processes) in conjunction with two phonetic characteristic of lenition: reduced duration and increased intensity. Using mediation analysis, we find causal asymmetries between the two, with reduced duration causally preceding increased intensity. These results are surprising as increased intensity (increased sonority) is often regarded as the defining property of lenition. The results not only simplify the assumptions associated with effort-reduction, prosodic smoothing, and low informativity, but they are also compatible with phonetic undershoot accounts.