Hey,
I’m working with pytorch and torchaudio in the context of an audio dataset.
As part of my dataset loading and feature extraction pipeline I’d like to apply a few transforms: resampling to a uniform sample rate, normalizing audio so that peaks are at 0dBFS, and extracting various spectral features (e.g. MFCCs).
All this can be defined nicely with Dataset and Data Loaders to my understanding so far.
My question is about how to serialize the results of such transforms so that they can be effectively cached for future use rather than be applied on the fly each time I start over iterating on the dataset, e.g. when developing different types of classifiers / models that use the data…
I assume this is a very common use case across audio/vision and other data pipelines but I could not find immediate answer for this. I can of course roll my own custom code to save things to disk and load back, but seems like would be a natural functionality to have in the dataset interface? Apologies if redundant question…
Best,
Adam