Stack frames for DNN input

lima · May 26, 2021, 1:56pm

I have a dataset of N audio frames of size (N , 13), and for the task of phoneme recognition

train_data = torch.hstack((train_feat, train_labels))
train_loader = torch.utils.data.DataLoader(train_data, batch_size= 128, shuffle=True)

print(train_data.shape)
torch.Size([3082092, 14])

How can I stack 7 frames each time to feed to the DNN?

ptrblck · May 27, 2021, 4:50am

I’m not sure which dimension refers to the “frame dimension”, but you could probably either use torch.stack or torch.cat to create the new stacked/concatenated tensor.

lima · May 27, 2021, 5:13am

Hi @ptrblck . The x_train dataset is 3082092 frames. Each frame has 13 numbers (features).
The y_train is 3082092 digit (labels).
That is for each frame (1,13) there’s one label…
Now, feeding pne frame to the DNN is not going to work because there’s too small information in it. Instead,I would like to stack a sequence of 7 frames (of those 3082092). I hope that makes sense.

ptrblck · May 27, 2021, 5:16am

Thanks for the explanation. The 7 frames would thus correspond to the batch size and you could set it in the DataLoader.

lima · May 27, 2021, 5:46am

In that case, the DNN would look at them as 7 individual units, but I wanted to stack 7 frames as one unit.

ptrblck · May 27, 2021, 5:54am

Could you explain how one “unit” would be processed in the model and what the expected input shape would thus be?