I have some time series data padded with 0s in the shape of (Batch, length, features). For more detail, I extracted MFCCs from audio files with (60,40), 60 frames, and 40 MFCCs for each audio file input.
I used to run Tensorflow and applied the Masking
layer with the value I wish to mask.
I am trying to do the same thing in Pytorch. I have done some research on this and found people mentioning pack_padded_sequence
from from torch.nn.utils.rnn
It appears that pack_padded_sequence
is the only way to do a mask for Pytorch RNN.
I have rewritten the dataset preparation codes and created a list containing all the 2D array data. It is a list with a length of 12746 and the 2d array inside is in the form of (x,40); “x” can be any number lower than 60. So basically I am going to prepare data for training in the shape of (12746,60,40)
How should I proceed as the packed sequence cannot be created as a PyTorch dataset?
class mydata(Dataset):
def __init__(self, X, y):
self.X = torch.FloatTensor(X)
self.y = torch.FloatTensor(y)
def __len__(self):
return len(self.X)
def __getitem__(self, index):
y = self.y[index]
X = self.X[index]
return X,y
padded = pad_sequence(data, batch_first=True, padding_value=0.0)
lengths = torch.tensor([len(t) for t in data])
# print('#padded', padded)
print('--------------------------------------------')
packed = torch.nn.utils.rnn.pack_padded_sequence(padded, lengths.to('cpu'), batch_first=True, enforce_sorted=False)
#split them into 0.7 proportion. It was done using Train_test_split.
X_train = packed[0:8900]
y_train = y[:8900]
X_valid = packed[8900:]
y_valid = y[8900:]
train_dataset = mytools.mydata(X_train,y_train)
valid_dataset = mytools.mydata(X_valid,y_valid)
trainloader = DataLoader(train_dataset, batch_size=256, shuffle=True, num_workers=0)
validloader = DataLoader(valid_dataset, batch_size=256, shuffle=False, num_workers=0)
I was thinking should this pack pad
procedure be done after I created the dataloader and right before feeding the input to the rnn?