Audio data augmentation

stases · October 27, 2020, 12:45pm

Hello! I am quite new to PyTorch and training DNN models in general. I’m working on audio separation and I would like to augment my dataset by cropping random overlapping segments of audio, adding noise, etc. What bothers me is how in general data augmentation works, meaning will I augment my data, save it to HDD and then load it, or is it done “per batch”, stored temporarily? I’m not sure if I have explained this properly. If the latter is the answer to my question, then is there anything similar (for audio) to augmentation for images in PyTorch (random crops, etc.)? Thanks in advance!

ptrblck · October 28, 2020, 10:30am

Data augmentation is applied on the fly for each batch during training.
If you are using a Dataset and pass it to a DataLoader with multiple workers, the data loading and processing would be executed in the background while the model is training.

I would have a look at torchaudio which ships with some transformation.

stases · October 28, 2020, 4:21pm

Hey Peter! Thank you for your reply. I will try using torchaudio for my case.