Real-time Meeting Minutes Transcription System

Task Describtion

The system records meeting minutes in real-time and transcribes the dialogue based on the respective speakers. The process starts by separating different speakers using Blind Source Separation, followed by Speaker Diarization, which segments and labels the speakers’ speech into specific segments. Finally, Automatic Speech Recognition converts the transcribed speech into text.

Background

A key challenge was achieving high accuracy in transcribing and separating the different speakers. There was a conflict between following existing research and developing a new algorithm to leverage the separated audio sources more effectively.

Proposed Solution

Based on the document provided, the Real-time meeting minutes transcription system involves a system that transcribes meetings in real time by utilizing Blind Source Separation (BSS), Speaker Diarization (SD), and Automatic Speech Recognition (ASR).

After discussions, a rule-based machine learning algorithm called “Add and Register” was created. This algorithm was based on the similarity of speaker embeddings to recognize different speakers and transcribe the meeting accordingly.

Outcome

Demo

My Contributions