I have some time series data padded with 0s in the shape of (Batch, length, features). For more detail, I extracted MFCCs from audio files with (60,40), 60 frames, and 40 MFCCs for each audio file input.
I used to run Tensorflow and applied the
Masking layer with the value I wish to mask.
I am trying to do the same thing in Pytorch. I have done some research on this and found people mentioning
It appears that
pack_padded_sequence is the only way to do a mask for Pytorch RNN.
I have rewritten the dataset preparation codes and created a list containing all the 2D array data. It is a list with a length of 12746 and the 2d array inside is in the form of (x,40); “x” can be any number lower than 60. So basically I am going to prepare data for training in the shape of (12746,60,40)
How should I proceed as the packed sequence cannot be created as a PyTorch dataset?
class mydata(Dataset): def __init__(self, X, y): self.X = torch.FloatTensor(X) self.y = torch.FloatTensor(y) def __len__(self): return len(self.X) def __getitem__(self, index): y = self.y[index] X = self.X[index] return X,y padded = pad_sequence(data, batch_first=True, padding_value=0.0) lengths = torch.tensor([len(t) for t in data]) # print('#padded', padded) print('--------------------------------------------') packed = torch.nn.utils.rnn.pack_padded_sequence(padded, lengths.to('cpu'), batch_first=True, enforce_sorted=False) #split them into 0.7 proportion. It was done using Train_test_split. X_train = packed[0:8900] y_train = y[:8900] X_valid = packed[8900:] y_valid = y[8900:] train_dataset = mytools.mydata(X_train,y_train) valid_dataset = mytools.mydata(X_valid,y_valid) trainloader = DataLoader(train_dataset, batch_size=256, shuffle=True, num_workers=0) validloader = DataLoader(valid_dataset, batch_size=256, shuffle=False, num_workers=0)
I was thinking should this
pack pad procedure be done after I created the dataloader and right before feeding the input to the rnn?