How to perform a zero masking for RNN in Pytorch

LXR · July 28, 2022, 5:30pm

I have some time series data padded with 0s in the shape of (Batch, length, features). For more detail, I extracted MFCCs from audio files with (60,40), 60 frames, and 40 MFCCs for each audio file input.

I used to run Tensorflow and applied the Masking layer with the value I wish to mask.

I am trying to do the same thing in Pytorch. I have done some research on this and found people mentioning pack_padded_sequence from from torch.nn.utils.rnn

It appears that pack_padded_sequence is the only way to do a mask for Pytorch RNN.

I have rewritten the dataset preparation codes and created a list containing all the 2D array data. It is a list with a length of 12746 and the 2d array inside is in the form of (x,40); “x” can be any number lower than 60. So basically I am going to prepare data for training in the shape of (12746,60,40)

How should I proceed as the packed sequence cannot be created as a PyTorch dataset?

class mydata(Dataset):
   def __init__(self, X, y):
        self.X = torch.FloatTensor(X)
        self.y = torch.FloatTensor(y)
                
   def __len__(self):
        return len(self.X)
    
   def __getitem__(self, index):
        y = self.y[index]
        X = self.X[index] 
        return X,y


padded = pad_sequence(data, batch_first=True, padding_value=0.0)
lengths = torch.tensor([len(t) for t in data])
# print('#padded', padded)

print('--------------------------------------------')
packed = torch.nn.utils.rnn.pack_padded_sequence(padded, lengths.to('cpu'), batch_first=True, enforce_sorted=False)

#split them into 0.7 proportion. It was done using Train_test_split.
X_train = packed[0:8900]
y_train = y[:8900]
X_valid = packed[8900:]
y_valid = y[8900:]

train_dataset = mytools.mydata(X_train,y_train)
valid_dataset = mytools.mydata(X_valid,y_valid)
trainloader = DataLoader(train_dataset, batch_size=256, shuffle=True, num_workers=0)
validloader = DataLoader(valid_dataset, batch_size=256, shuffle=False, num_workers=0)

I was thinking should this pack pad procedure be done after I created the dataloader and right before feeding the input to the rnn?