Dear All;
I am running Kaldi ASR toolkit and fit MFCC features from speech Dataset and stored it in .ark, .scp and CMVN files, so how I can train my Network based on these files
Thanks
Dear All;
I am running Kaldi ASR toolkit and fit MFCC features from speech Dataset and stored it in .ark, .scp and CMVN files, so how I can train my Network based on these files
Thanks
You could use a library like kaldiio to load these samples and create a custom Dataset
and pass it to a DataLoader
as explained in this tutorial.
Once you have the Dataset
ready, you could continue working with the architecture or your model.
I’m not completely sure how the data is stored, but since you are dealing with MFCC data, I assume you could treat it as “image” data?
torchaudio supports ark and scp, offers MFCC, and also has a template for datasets with DataLoader