Hello,
I am working on a time series dataset using LSTM. Each sequence has the following dimension “S_ix6”, e.g. the sequences have different lengths. I first created a network (netowrk1), and in the “forward” function padded each sequence, so they have the same length. But unfortunately, the networks could not really learn the structures in the data. So I decided to not pad the sequences and rewrote the network (network2) so that in the forward pass there is a for-loop over each sequence in a batch, whereas mentioned before they have different lengths. And lo and behold, the network converges much better! Now my question is:
Questions:
- What is really the effect of padding on the network?
- Why padding the sequences ends in a worse convergence result?
Network 1: With padding
class DeepIO(nn.Module):
def __init__(self):
super(DeepIO, self).__init__()
self.rnn = nn.LSTM(input_size=6, hidden_size=512,
num_layers=2, bidirectional=True)
self.drop_out = nn.Dropout(0.25)
self.fc1 = nn.Linear(512, 256)
self.bn1 = nn.BatchNorm1d(256)
self.fc_out = nn.Linear(256, 7)
def forward(self, x):
"""
args:
x: a list of inputs of diemension [BxTx6]
"""
lengths = [x_.size(0) for x_ in x] # get the length of each sequence in the batch
x_padded = nn.utils.rnn.pad_sequence(x, batch_first=True) # padd all sequences
b, s, n = x_padded.shape
# pack padded sequece
x_padded = nn.utils.rnn.pack_padded_sequence(x_padded, lengths=lengths, batch_first=True, enforce_sorted=False)
# calc the feature vector from the latent space
out, hidden = self.rnn(x_padded)
# unpack the featrue vector
out, lens_unpacked = nn.utils.rnn.pad_packed_sequence(out, batch_first=True)
out = out.view(b, s, self.num_dir, self.hidden_size[0])
# many-to-one rnn, get the last result
y = out[:, -1, 0]
y = F.relu(self.fc1(y), inplace=True)
y = self.bn1(y)
y = self.drop_out(y)
y = self.out(y)
return y
Network 2: Without padding
class DeepIO(nn.Module):
def __init__(self):
super(DeepIO, self).__init__()
self.rnn = nn.LSTM(input_size=6, hidden_size=512,
num_layers=2, bidirectional=True)
self.drop_out = nn.Dropout(0.25)
self.fc1 = nn.Linear(512, 256)
self.bn1 = nn.BatchNorm1d(256)
self.fc_out = nn.Linear(256, 7)
def forward(self, x):
"""
args:
x: a list of inputs of diemension [BxTx6]
"""
outputs = []
# iterate in the batch through all sequences
for xx in x:
s, n = xx.shape
out, hiden = self.rnn(xx.unsqueeze(1))
out = out.view(s, 1, 2, 512)
out = out[-1, :, 0]
outputs.append(out.squeeze())
outputs = torch.stack(outputs)
y = F.relu(self.fc1(outputs), inplace=True)
y = self.bn1(y)
y = self.drop_out(y)
y = self.out(y)
return y
Thanks
Arash