I’ve got a dataset consisting of x number of 10-second recordings. I’ve got a data loader which can load these in one by one. However, I want my model to predict a value for each timestep and was wanting to feed it something like
audio_file1[timestep_n -2: timestep_n +2] as training examples, which will then make a prediction for timestep_n. I’m not sure the best way to go about this with PyTorch.
Currently, I’m thinking that I might have to include another nested loop after the data loader loads in the audio file where I break that into blocks and run through us it using a moving average style indexing, once the model has been fed all those blocks, the data loader then loads the next audio file.
Not sure if it’s relevant, but at present, I’m just looking to predict one feature per timestep per frequency band. However, in the future, I’m looking to extend this to predict multiple features per time step, per frequency band.
You could set it up so the dataloader itself works through the files in that manner? i.e. rather than break the files up in an external loop, create a custom dataset that breaks the files up into timestep chunks for you.
I did consider this, but then would I still need another internal loop to run all the chunks through the model? In my head it looks like this…
for idx, batch in enumerate(dataloader):
for x in y:
code that either feeds chunks into the model
or code that takes chunks of the audio file and feeds it into model
Is there a way I could get the dataset to break it into chunks but also only return one chunk at a time?
I don’t think there’s anything that should stop your dataset from doing so? Just make sure the length is set appropriately (i.e. the maximum number of chunks it can return), and then have the relevant index point to a specific chunk. e.g. if you have 10 10-second clips and your timestep is 1s, you want to calculate length as 100 (and return that), then when asked for index 55 you’d want the 6th chunk of the 5th file (because 0 indexing). Everything should still work with batching and shuffled orders etc. as normal, then.
The fastest way would probably be to load all the audio clips into memory and split them first when initialising the dataset, that way it’s a normal array access, but that would be memory intensive.
Ahhh okay, I see what you mean. So, calculate how many chunks I want the dataset to be able to return, then do some index wrangling in
def __getitem__() to ensure I’m taking the desired slice of the data. That sounds very do-able.
At the moment I’m only working with a small subset of my dataset just to get everything up and running. But the entire dataset is stored in a single HDF5 dataset/array, so indexing should be fairly straight forward. The actual dataset consists of 64 x 10 minute recordings.
I’m still pretty new to Pytorch so still getting to grips with how customizable stuff like Datasets are. Thanks very much!