Kaldi Voice Activity Detection (VAD)

Unfortunately, I don’t know how Kaldi detects the speech and if it’s a filtering algorithm or some kind of machine learning model. If Kaldi would work, you could stick to it and preprocess the data in this way. I’m unsure, if you would like to reimplement Kaldi’s algorithm in torchaudio or how they should be combined.