How do you transcribe multiple speakers speaking on different volume levels?
Answers
Answer:
Speaker diarization
Speech-to-Text can recognize multiple speakers in the same audio clip. When you send an audio transcription request to Speech-to-Text, you can include a parameter telling Speech-to-Text to identify the different speakers in the audio sample. This feature, called speaker diarization, detects when speakers change and labels by number the individual voices detected in the audio.
When you enable speaker diarization in your transcription request, Speech-to-Text attempts to distinguish the different voices included in the audio sample. The transcription result tags each word with a number assigned to individual speakers. Words spoken by the same speaker bear the same number. A transcription result can include numbers up to as many speakers as Speech-to-Text can uniquely identify in the audio sample.
When you use speaker diarization, Speech-to-Text produces a running aggregate of all the results provided in the transcription. Each result includes the words from the previous result. Thus, the words array in the final result provides the complete, diarized results of the transcription.
Review the language support page to see if this feature is available for your language.