Dis­sect­ing, un­der­stand­ing and ma­nip­u­lat­ing the hu­man voice

 |  ResearchCollaborative Research CentresArtificial IntelligenceNewsTRR 318 - Technisch unterstütztes Erklären von Stimmcharakteristika (Teilprojekt C06)

A team of computer scientists and linguists from the Bielefeld and Paderborn universities has investigated how different parts of human speech can be separated from each other and thus better analysed and modified. The results flow into the research of the TRR subproject C06 "Technically enabled explaining of speaker traits".

 "The human voice is a complex construct made up of superimpositions of various influencing factors. As a result, it has different characteristics that are difficult to identify," says Professor Dr Reinhold Häb-Umbach, Professor of Communications Engineering at the Paderborn University and one of the leaders of subproject C06. "By breaking down speech signals into different components, we can learn more about what makes our voices unique."

The components are distinguished between linguistic-content properties - what someone says - and tonal properties - how the voice sounds in the process. In their publication, the researchers show how the individual components are connected at the tonal level. To do this, they created a model from neural networks that separates the different tonal aspects from each other. This can be used to create a new synthetic language with specifically modified properties, for example a desired average pitch.

The researchers presented the results in their article "Speech Disentanglement for Analysis and Modification of Acoustic and Perceptual Speaker Characteristics". "With the publication, we contribute to understanding how we can use computers to understand and modify different aspects of speech," summarises Frederik Rautenberg, co-author of the article and also a researcher in subproject C06. "This will allow us to develop language modification programmes that can help people with speech difficulties, for example."

The article was presented at the 49th Annual Conference on Acoustics (DAGA). The DAGA is the largest conference on acoustics in the German-speaking world and was held in Hamburg from 6 to 9 March.

.

Project C06 "Technically enabled explaining of speaker traits"

In its research, sub-project C06 examines voice characteristics and how they can be manipulated with the computer. The goal is to develop an intelligent system that experts can use to explain the phenomenon of voice to laypeople.


Further information:

 

[Translate to English:]
[Translate to English:]
[Translate to English:] Foto (TRR 318): Frederik Rautenberg, research assistant in subproject C06.

Contact

business-card image

Prof. Dr. Reinhold Häb-Umbach

Communications Engineering / Heinz Nixdorf Institute

Head of Department of Communications Engineering

Write email +49 5251 60-3626
business-card image

Frederik Rautenberg

Communications Engineering / Heinz Nixdorf Institute

Research & Teaching

Write email +49 5251 60-3680