I am using PyTorch to process audio data through conv1d layers, however, I am running out RAM (only 8GB available). I have an input .wav and a target .wav for the network to learn, and each file is 40MB (about 4 minutes of audio).
In this model, one sample of audio is learned from the previous 200 samples. (Input 200 samples and output 1 sample). In order to accomplish this, I am taking (for example) the input 8000000 samples and “unfolding” it into (8000000, 200, 1), where each audio sample becomes an array of the previous 200 samples. I then train on “unfolded_input_samples” as the training data “target_samples” as the validation data.
The problem is I quickly run out of RAM when unfolding the input data. Is there a way around creating this massive array while still telling PyTorch to use the previous 200 samples for each output data point? Can I break up the unfolded input array into chunks and train on each part without starting a new epoch? Or is there an easier way to accomplish this using some kind of built in method in Pytorch. Thanks!