Speech Signal Improvement Challenge - ICASSP2024

Task Describtion

This project was part of the Speech Signal Improvement Challenge 2024, where we participated in the Real-time track. The main objective was to improve speech signal quality in real-time, targeting environments with diverse noise sources and challenging acoustics.

Background

The challenge evaluated speech enhancement systems in both real-time and non-real-time tracks. The real-time track, in particular, focused on improving speech quality for communication systems in noisy environments, considering metrics like Word Accuracy (WAcc) and ITU-T P.835 subjective evaluations. We developed an advanced deep learning model that successfully enhanced speech signals in real-time, addressing issues like low signal-to-noise ratio (SNR), reverberation, and interference from multiple speakers.

Proposed Solution

Frequency Rolling (FR) : This process effectively rolled the frequency axis into the channel axis, allowing grouped convolutions to handle the frequencies separately with a reduced computational footprint.
Frequency-wise Self-Attention with Time-wise LSTM for Causality

Outcome

We ranked 5th out of 13 teams in the Real-time Track.

My Contributions

I applied Frequency Rolling (FR) because I found it effective in reducing computational complexity without significantly degrading performance. This approach has been successfully used in previous research on music separation, where rolling the frequency axis into the channel axis improved efficiency while preserving the model’s capacity to handle frequency domain information.

Prev Next Top