I’m not sure I can distinguish and understand the difference between:
-
VAD
(Voice Activity Detection) and Speaker Segmentation
I understand that:
- VAD - split audio to segments of speech or not speech
- Speaker Segmentation - split audio to segments of not speech and different speakers
for example:
VAD = [not speech, speech, not speech, speech, not speech]
Speaker Segmentation = [not speech, speech , not speech, speech A, speech B, not speech]
Am I right ?
Is my example correct ?