Source Localization and Speech Enhancement in Robot Vacuum Cleaner

Task Describtion

Developing a direction detection system for single-source audio in extreme noise environments, designed for robot vacuums equipped with multiple microphones.

Background

In noisy environments, especially in hardware-constrained robot vacuums, it is challenging to accurately detect the direction of a single audio source while reducing noise in real-time.

Proposed Solution

System: Designed an integrated system that efficiently combines noise reduction and direction detection algorithms in real-time.
Noise Reduction Module: Developed a lightweight noise reduction model (Tiny Recurrent U-Net) optimized for low-latency real-time operation on limited hardware.
Source Localization Module: Enhanced direction detection by utilizing the output of the noise reduction module, applying a Signal-to-Noise Ratio mask and Recursive Least Squares for noise estimation.

Outcome

Demo

[Journal, Patent] J. -H. Kim, Taehan Kim, S. -H. Kim, J. -M. Song, Y. -J Park, Hyung-Min Park. A Real-Time Sound Source Localization System for Robotic Vacuum Cleaner with a Microphone Array.
Submitted to IEEE Sensors Journal*(2024) /Applying for KR, US Patent(2024W)
Ongoing collaboration with LG Electronics to apply single-channel noise enhancement technologies to robot vacuum systems.

My Contributions

Developed a real-time speech enhancement model optimized for limited hardware.
- Enhancement performance significantly drops in low SNR conditions due to the difference in meaningful information content according to information theory.
- Especially, the consonants and high-frequency components of vowels either have low energy or are widely dispersed, making them easily buried under strong noise, resulting in minimal information transmission.
- However, in the case of vowels with concentrated energy and low frequencies, the enhancement model can effectively retain these parts.
- For tasks like source localization, which require accurate speech information, it’s necessary to identify the parts where the speech direction is clearly preserved.
- By leveraging the characteristic where vowels with concentrated energy and low frequencies are preserved, even though overall enhancement performance might be lacking, the model can be utilized to identify clear speech segments.
Contributed to improving the direction detection algorithm by utilizing noise reduction results.
- The traditional SNR-based masks used for source localization have certain issues:
  - Even if speech is present, if surrounding energy is strong, incorrect masks can be generated for direction detection.
- To preserve parts where speech energy is dominant, we tracks the maximum values of the speech enhancement output and directly generates the mask from these values.

Prev Next Top