Looking for a better way to split up input data to conv1d so I don’t run out of RAM

I am using PyTorch to process audio data through conv1d layers, however, I am running out RAM (only 8GB available). I have an input .wav and a target .wav for the network to learn, and each file is 40MB (about 4 minutes of audio).

In this model, one sample of audio is learned from the previous 200 samples. (Input 200 samples and output 1 sample). In order to accomplish this, I am taking (for example) the input 8000000 samples and “unfolding” it into (8000000, 200, 1), where each audio sample becomes an array of the previous 200 samples. I then train on “unfolded_input_samples” as the training data “target_samples” as the validation data.

The problem is I quickly run out of RAM when unfolding the input data. Is there a way around creating this massive array while still telling PyTorch to use the previous 200 samples for each output data point? Can I break up the unfolded input array into chunks and train on each part without starting a new epoch? Or is there an easier way to accomplish this using some kind of built in method in Pytorch. Thanks!

I’m not sure of the motivation for this ‘unfolded’ tensor. The data is 99.5% redundant, and the 200-sample window can be created at train-time from the audio-data tensor. Just take a slice audio[i:i+200] of the 8m audio tensor, and audio[i+201] as your target.
The only extra ingredient, then, is the set of offset indices into your audio data, which you could sample randomly or sequentially.

That’s exactly what I’m looking for, thanks! I should have been more specific, I’m actually using PyTorch lightning, so I may need make a lower level PyTorch training loop to index the data in that way during training.

Thanks for the quick response!