Bandwidth extension of speech signals
Deep Learning for speech signal bandwidth extension in a radio communication context with unconventional sound capturing devices
Supervised student : Julien Hauret
Supervision : Éric Bavu et Thomas Joubaud
Project duration : Ongoing (2021 - …)
Funding : Co-funding ANR-IA and Institut Franco-Allemand de Recherches de Saint Louis
Abstract : Speech recording for radio communications is usually done using microphones located near the speaker’s mouth. However, these conventional sound capture systems are sensitive to ambient noise, which significantly reduces the intelligibility of the speech** captured by the transducer. Current solutions primarily involve the use of differential microphones, but recent systems developed by the APC team at ISL exploit **unconventional microphones** such as bone conduction transducers or in-ear microphones.
Figure : Example of an unconventional speech pickup device: in-ear pickup behind active/passive hearing protectors.
With these systems, the speaker is provided with appropriate hearing protection, and the speech signal is also more insensitive to ambient noise, since the microphone is located inside earplugs, thus improving communication performance in difficult and noisy environments. However, speech recorded with these unconventional microphones is degraded due to the acoustic path between the mouth and the transducers: with in-ear microphones, low frequencies are amplified and almost no acoustic signal is recorded above 2 kHz, motivating the use of deep learning signal enhancement methods to extrapolate the missing high frequency content.
Generative modeling of audio signals is a fundamental problem at the intersection of signal processing and machine learning, and one of the most significant recent advances in AI-based audio processing has been the ability to directly model raw signals in the time domain using neural networks. In this project, we explore new modeling algorithms for audio. In particular, we focus on a specific audio generation problem called bandwidth expansion, in which the task is to reconstruct a high-quality sound from a low-quality, undersampled input. From a practical point of view, this technique also has applications in telephony, compression, text-to-speech generation, forensic analysis of audio recordings, and other areas.
Publications and communications related to the project
- Julien Hauret, Eric Bavu, Thomas Joubaud, et Véronique Zimpfer. Deep Learning pour l’amélioration de signaux vocaux captés avec des transducteurs intra-auriculaires , 16ème Congrès Français d’Acoustique, Marseille, Apr 2022.