Real-time Meeting Minutes Transcription System
Task Describtion
The system records meeting minutes in real-time and transcribes the dialogue based on the respective speakers. The process starts by separating different speakers using Blind Source Separation, followed by Speaker Diarization, which segments and labels the speakers’ speech into specific segments. Finally, Automatic Speech Recognition converts the transcribed speech into text.
Background
A key challenge was achieving high accuracy in transcribing and separating the different speakers. There was a conflict between following existing research and developing a new algorithm to leverage the separated audio sources more effectively.
Proposed Solution
Based on the document provided, the Real-time meeting minutes transcription system involves a system that transcribes meetings in real time by utilizing Blind Source Separation (BSS), Speaker Diarization (SD), and Automatic Speech Recognition (ASR).
After discussions, a rule-based machine learning algorithm called “Add and Register” was created. This algorithm was based on the similarity of speaker embeddings to recognize different speakers and transcribe the meeting accordingly.
Outcome
Demo
- Awarded in two categories at the 2022 Sogang Convergence Technology Contest by Sogang University and Naver.
My Contributions
- Developed an efficient speaker diarization system using speaker embeddings.
- Integrated the speech separation and speech recognition systems for meeting minutes transcription.