I am now trying to train a 3-layer LSTM model with sequence samples of various length. The tools that I use are pack_padded_sequence and pad_packed_sequence. I refer the examples in (About the variable length input in RNN scenario) to implement my own version. However I met a similar situation as posted in the link. My network does not converge. My code is shown below:
The code used for batch sorting : (to make sure the samples in the batch are sorted based on length, in descending order )
def sort_batch(x, length):
batch_size = x.size(0) # get size of batch
sorted_length, sorted_idx = length.sort() # sort the length of sequence samples
reverse_idx = torch.linspace(batch_size-1,0,batch_size).long()
reverse_idx = reverse_idx.cuda(GPU_ID)
sorted_length = sorted_length[reverse_idx] # for descending order
sorted_idx = sorted_idx[reverse_idx]
sorted_data = x[sorted_idx] # sorted in descending order
return sorted_data, sorted_length
forward function:
def forward(self, x, l):
x = pack_padded_sequence(x, list(l.data), batch_first = True) # pack batch
r_out, (h_n, h_c) = self.rnn(x, None) # pass in LSTM model
r_out, _ = pad_packed_sequence(r_out, batch_first = True) # unpack batch
idx = (l-1).view(-1,1).expand(r_out.size(0), r_out.size(2)).unsqueeze(1).long()
r_out = r_out.gather(1, idx).squeeze().unsqueeze(1) # get last hidden output of each sequence
out = self.out(r_out[:, -1, :])
return out
training :
for step, (x, y, l) in enumerate(dset_loaders['Train']): # gives batch data
b_x = Variable(x.cuda(GPU_ID).view(-1, TIME_STEP, INPUT_SIZE)) # reshape x to (batch, time_step, input_size)
b_y = torch.squeeze(Variable(y.cuda(GPU_ID))) # batch y
b_l = torch.squeeze(Variable(l.cuda(GPU_ID)))
b_x, b_l = sort_batch(b_x, b_l)
output = model(b_x, b_l) # rnn output
loss = loss_func(output, b_y) # cross entropy loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
The output is :
Data Loading : 100% [==========================================================================================] Elapsed Time: 0:01:15 | Time: 0:01:15
Train Set : 60 Categories, 33212 Samples in total
Data Loading : 100% [==========================================================================================] Elapsed Time: 0:00:42 | Time: 0:00:42
Test Set : 60 Categories, 17885 Samples in total
('\n', RNN (
(rnn): LSTM(75, 100, num_layers=3, batch_first=True)
(out): Linear (100 -> 60)
))
LR is set to 0.01
Epoch: 0 | train loss: 4.0995 | test accuracy: 0.0167
Epoch: 0 | train loss: 4.1131 | test accuracy: 0.0165
Epoch: 0 | train loss: 4.1193 | test accuracy: 0.0161
Epoch: 0 | train loss: 4.1162 | test accuracy: 0.0208
Epoch: 0 | train loss: 4.1005 | test accuracy: 0.0166
Epoch: 0 | train loss: 4.1043 | test accuracy: 0.0164
Epoch: 0 | train loss: 4.0987 | test accuracy: 0.0167
Epoch: 0 | train loss: 4.1065 | test accuracy: 0.0167
Epoch: 0 | train loss: 4.1033 | test accuracy: 0.0167
I am not sure where the problem is. Am I using the pack_padded_sequence and pad_packed_sequence in a wrong way? Is there any way that I can debug this model?
Besides, I want to use gradcheck to debug my code. However I don’t know how to use it in my case. Anyone can help?
Thank you so much.