I’m not sure I can distinguish and understand the difference between:
VAD(Voice Activity Detection) and
I understand that:
- VAD - split audio to segments of speech or not speech
- Speaker Segmentation - split audio to segments of not speech and different speakers
VAD = [not speech, speech, not speech, speech, not speech] Speaker Segmentation = [not speech, speech , not speech, speech A, speech B, not speech]
Am I right ?
Is my example correct ?