Technically Enabled Explaining of Speaker Traits
Overview
The speech signal is a rich source of information that conveys linguistic but also what is termed para- or extralinguistic content, revealing a speaker’s identity, gender, emotional or cognitive state, age, and health. These traits have been the subject of many investigations in phonetics, but due to the high complexity of the underlying dimensions, are often confined to highly controlled datasets that do not generalize. Practical knowledge about the phonetics of speaker characteristics is also indispensible for voice practitioners such as speech therapists, actors or public speakers. Whereas speech technology is able to classify and even disentangle the complex signals underlying speech characteristics, the discipline hitherto does not provide interpretable models that aid phonetic experts in a knowledge transfer to non-expert voice practitioners. Our project will therefore examine the possibility of developing technical solutions as a tool to support the generation of explanations within speech science. We argue specifically that the phonetic realization of a dimension of phonetic variation can be pinpointed much better if two speech probes are generated that contain the same linguistic content and differ only in the manifestation of a single trait. These explanations should ultimately enable voice practitioners to either identify or mimic the paralinguistic dimensions of interest.
Key Facts
- Project duration:
- 01/2021 - 12/2025
- Funded by:
- DFG