Looking for a better way to split up input data to conv1d so I don’t run out of RAM

Keith72 · November 21, 2020, 1:51pm

I am using PyTorch to process audio data through conv1d layers, however, I am running out RAM (only 8GB available). I have an input .wav and a target .wav for the network to learn, and each file is 40MB (about 4 minutes of audio).

In this model, one sample of audio is learned from the previous 200 samples. (Input 200 samples and output 1 sample). In order to accomplish this, I am taking (for example) the input 8000000 samples and “unfolding” it into (8000000, 200, 1), where each audio sample becomes an array of the previous 200 samples. I then train on “unfolded_input_samples” as the training data “target_samples” as the validation data.

The problem is I quickly run out of RAM when unfolding the input data. Is there a way around creating this massive array while still telling PyTorch to use the previous 200 samples for each output data point? Can I break up the unfolded input array into chunks and train on each part without starting a new epoch? Or is there an easier way to accomplish this using some kind of built in method in Pytorch. Thanks!

superunification · November 21, 2020, 4:53pm

I’m not sure of the motivation for this ‘unfolded’ tensor. The data is 99.5% redundant, and the 200-sample window can be created at train-time from the audio-data tensor. Just take a slice audio[i:i+200] of the 8m audio tensor, and audio[i+201] as your target.
The only extra ingredient, then, is the set of offset indices into your audio data, which you could sample randomly or sequentially.

Keith72 · November 21, 2020, 5:25pm

That’s exactly what I’m looking for, thanks! I should have been more specific, I’m actually using PyTorch lightning, so I may need make a lower level PyTorch training loop to index the data in that way during training.

Thanks for the quick response!