I am now trying to train a 3-layer LSTM model with sequence samples of various length. The tools that I use are pack_padded_sequence and pad_packed_sequence. I refer the examples in (About the variable length input in RNN scenario) to implement my own version. However I met a similar situation as posted in the link. My network does not converge. My code is shown below:
The code used for batch sorting : (to make sure the samples in the batch are sorted based on length, in descending order )
def sort_batch(x, length): batch_size = x.size(0) # get size of batch sorted_length, sorted_idx = length.sort() # sort the length of sequence samples reverse_idx = torch.linspace(batch_size-1,0,batch_size).long() reverse_idx = reverse_idx.cuda(GPU_ID) sorted_length = sorted_length[reverse_idx] # for descending order sorted_idx = sorted_idx[reverse_idx] sorted_data = x[sorted_idx] # sorted in descending order return sorted_data, sorted_length
def forward(self, x, l): x = pack_padded_sequence(x, list(l.data), batch_first = True) # pack batch r_out, (h_n, h_c) = self.rnn(x, None) # pass in LSTM model r_out, _ = pad_packed_sequence(r_out, batch_first = True) # unpack batch idx = (l-1).view(-1,1).expand(r_out.size(0), r_out.size(2)).unsqueeze(1).long() r_out = r_out.gather(1, idx).squeeze().unsqueeze(1) # get last hidden output of each sequence out = self.out(r_out[:, -1, :]) return out
for step, (x, y, l) in enumerate(dset_loaders['Train']): # gives batch data b_x = Variable(x.cuda(GPU_ID).view(-1, TIME_STEP, INPUT_SIZE)) # reshape x to (batch, time_step, input_size) b_y = torch.squeeze(Variable(y.cuda(GPU_ID))) # batch y b_l = torch.squeeze(Variable(l.cuda(GPU_ID))) b_x, b_l = sort_batch(b_x, b_l) output = model(b_x, b_l) # rnn output loss = loss_func(output, b_y) # cross entropy loss optimizer.zero_grad() # clear gradients for this training step loss.backward() # backpropagation, compute gradients optimizer.step() # apply gradients
The output is :
Data Loading : 100% [==========================================================================================] Elapsed Time: 0:01:15 | Time: 0:01:15 Train Set : 60 Categories, 33212 Samples in total Data Loading : 100% [==========================================================================================] Elapsed Time: 0:00:42 | Time: 0:00:42 Test Set : 60 Categories, 17885 Samples in total ('\n', RNN ( (rnn): LSTM(75, 100, num_layers=3, batch_first=True) (out): Linear (100 -> 60) )) LR is set to 0.01 Epoch: 0 | train loss: 4.0995 | test accuracy: 0.0167 Epoch: 0 | train loss: 4.1131 | test accuracy: 0.0165 Epoch: 0 | train loss: 4.1193 | test accuracy: 0.0161 Epoch: 0 | train loss: 4.1162 | test accuracy: 0.0208 Epoch: 0 | train loss: 4.1005 | test accuracy: 0.0166 Epoch: 0 | train loss: 4.1043 | test accuracy: 0.0164 Epoch: 0 | train loss: 4.0987 | test accuracy: 0.0167 Epoch: 0 | train loss: 4.1065 | test accuracy: 0.0167 Epoch: 0 | train loss: 4.1033 | test accuracy: 0.0167
I am not sure where the problem is. Am I using the pack_padded_sequence and pad_packed_sequence in a wrong way? Is there any way that I can debug this model?
Besides, I want to use gradcheck to debug my code. However I don’t know how to use it in my case. Anyone can help?
Thank you so much.