The voice is a multifaceted tool, central to the human experience. Whether identifying a speaker, understanding a message, or enjoying a song, listeners seamlessly map acoustic properties onto meaning(s). Drawing on insights from multiple disciplines, I propose an interactionist framework to understand the cognitive processes that shape our interpretation of vocal sounds. We employ a diverse methodological toolkit, including behavioral experiments, acoustic analyses, electroencephalography (EEG), and advanced statistical modeling, to acoustically characterize mental representations of abstract categories (e.g., correctness, beauty, or humanness) and uncover the cognitive processes at play, with a special focus on individual differences in how we make sense of sounds.

Here are some examples of current research projects focusing on identity, communication, and aesthetics.
Identity
With the emergence of Text-To-Speech (TTS) technology and its increasing use in our daily lives, voices can now be described in terms of their “humanness.” While this term is not a new dictionary entry, its current use represents a relatively recent development in human history. Until now, the voice was inherently human, as it was produced by the human vocal tract; this is no longer the case.
When asked, listeners do not always agree on whether a voice has a human source and can indeed be fooled. However, their performance in identifying voices is significantly better than chance (Bruder, Breda, & Larrouy-Maestri, 2025), which supports the human ability to distinguish between human- and machine-generated voices. In our latest work, we have made two key observations: the perception of a voice’s humanness is significantly affected by speech content, and AI-voices still possess a characteristic prosodic contour that distinguishes them from human speech (Wester & Larrouy-Maestri, under review). Building on these findings, we are now working to determine the neural mechanisms behind “humanness” processing in collaboration with Nadine Lavan and Mathias Scharinger. This project is made possible thanks to the generous support of the Max Planck Society and an Imminent Research Grant.



Language/communication
The tone of the voice is a fundamental channel for communicating emotional states and intentions. While the acoustic features of prosody have been studied extensively since Banse & Scherer’s (1996) seminal work, the mechanisms underlying this communication are still not fully understood. In a recent interdisciplinary review spanning psychology, linguistics, acoustics, and more clinical fields, we synthesized approaches and findings to better describe the acoustic correlates of emotional prosody.

Whereas most listeners seem to share the ‘code’ (or interpret adequately a prosodic signal) to access emotions/intentions of speakers; misunderstandings easily occur. By analysing a large set of existing corpora, we identified some of the crucial factors affecting acoustic to meaning mappings in the case of emotion communication.

Our recent work investigates the cognitive processes underlying prosody comprehension, specifically how we integrate dynamic acoustic cues to understand intentions, such as irony (Larrouy-Maestri et al., 2023), criticism, or suggestion (Larrouy-Maestri et al., under review). In addition to identifying a crucial level of pitch information (within a two-dimensional space based on contour direction and spikiness), we (together with Hanna Ringer, Daniela Sammler, and David Poeppel) have identified distinct listener profiles that not only illuminate the sources of everyday miscommunication but also provide valuable insights for clinical applications.


Aesthetics
Perception of correctness
Listeners can easily say if a singer sounds in tune or out of tune (Larrouy-Maestri et al., 2013, 2015), with some tolerance (Larrouy-Maestri, 2018). However, as with other subjective categories like beauty or obscenity, the fundamental basis for categorizing a performance as “correct” remains unclear. This project investigates the concept of “correctness” across different musical contexts and dimensions (pitch and time), as well as the cognitive processes underlying this categorization, by employing a combination of psychophysics, physiology, and electrophysiology methods.
Explore your own perception: You can quantify your own correctness thresholds (or those of your participants) with our short and engaging Mistuning Perception Test, described in Larrouy-Maestri, Harrison, and Müllensiefen (2019).



From correctness to preferences
Previous studies showed that lay and expert listeners share similar definitions of what is “correct” when listening to untrained (Larrouy-Maestri, Magis, Grabenhorst, & Morsomme, 2015) and trained singers (Larrouy-Maestri, Morsomme, Magis, & Poeppel, 2017), or to their own performances (Larrouy-Maestri, Wang, Vairo Nunes, & Poeppel, 2021). However, we usually don’t attend opera or pop concerts to evaluate the correctness of a performance, but rather to enjoy it.
Together with Camila Bruder and leveraging the valuable insights of our collaborators, we are examining what “preference” means in the context of sung performances and exploring the roots (and idiosyncrasies) of this aesthetic experience.


Next steps
The voice as a window into the brain: We investigate the strength and nature of acoustics-meaning mappings to reveal underlying cognitive and neural architecture.
Interactionist approach to voice(s) perception: Given the multifaceted nature of the voice and its significance across diverse (sometimes overlapping) domains, we plan to broaden the scope of voice investigation beyond a singular context.

Note about our group: In addition to embracing open-access science, we explore different approaches, such as large cross-cultural collaborations (see Ozaki et al., 2024) and interdisciplinary projects (e.g., with T. Ata Aydin, Melanie Wald-Fuhrmann, and Fredrik Ullén), to investigate the perceptual boundaries of different sound categories. Importantly, we are mindful of the biases introduced by experimental constraints and are committed to providing high-quality and ecologically valid stimulus sets (see Bruder & Larrouy-Maestri, 2025; Holz, Larrouy-Maestri, & Poeppel, 2021) and to test innovative methodological approaches, such as free sorting tasks (Fink, Hörster, Poeppel, Wald-Fuhrmann, & Larrouy-Maestri, preprint), to ensure our experimental findings are meaningful.
Current collaboration with
Melanie Wald-Fuhrmann
Lauren Fink
David Poeppel
Daniela Sammler
Hanna Ringer
Fredrik Ullén
Pat Savage
Nadine Lavan
Mathias Scharinger
Neda Mousavi
Sarah Platte
Morten Schuldt Jensen
Dominique Morsomme
Angélique Remacle
Klaus Frieler
Xiangbin Teng
Zofia Holubowska
Current supervision of
Camila Bruder
Janniek Wester
T. Ata Aydin
Previous (most recent) supervision of
Jonathan Mueller-Gerbes
Zofia Hobubowska
Zoe Nikolakis
Pol van Rijn
Madita Horster
Natalie Holz
Sebastian Hessler
Cecilia Musci
Franziska Hanning
Francie Hoehler
Xinyue Wang
Simone Franz
Renan Vairo Nunes
Marion Hubin
