[solved] Training LSTM by using samples of various sequence length

Junwu_Weng · November 8, 2017, 8:24am

I am now trying to train a 3-layer LSTM model with sequence samples of various length. The tools that I use are pack_padded_sequence and pad_packed_sequence. I refer the examples in (About the variable length input in RNN scenario) to implement my own version. However I met a similar situation as posted in the link. My network does not converge. My code is shown below:

The code used for batch sorting : (to make sure the samples in the batch are sorted based on length, in descending order )

def sort_batch(x, length):

    batch_size = x.size(0)                       # get size of batch
    sorted_length, sorted_idx = length.sort()    # sort the length of sequence samples
    reverse_idx = torch.linspace(batch_size-1,0,batch_size).long()
    reverse_idx = reverse_idx.cuda(GPU_ID)

    sorted_length = sorted_length[reverse_idx]    # for descending order
    sorted_idx    = sorted_idx[reverse_idx]
    sorted_data   = x[sorted_idx]                 # sorted in descending order

    return sorted_data, sorted_length

forward function:

def forward(self, x, l):

    x = pack_padded_sequence(x, list(l.data), batch_first = True)  # pack batch
    r_out, (h_n, h_c) = self.rnn(x, None)                          # pass in LSTM model 
    r_out, _ = pad_packed_sequence(r_out, batch_first = True)      # unpack batch

    idx = (l-1).view(-1,1).expand(r_out.size(0), r_out.size(2)).unsqueeze(1).long()
    r_out = r_out.gather(1, idx).squeeze().unsqueeze(1)   # get last hidden output of each sequence

    out = self.out(r_out[:, -1, :])
    return out

training :

for step, (x, y, l) in enumerate(dset_loaders['Train']):        # gives batch data

        b_x = Variable(x.cuda(GPU_ID).view(-1, TIME_STEP, INPUT_SIZE))  # reshape x to (batch, time_step, input_size)
        b_y = torch.squeeze(Variable(y.cuda(GPU_ID)))                   # batch y
        b_l = torch.squeeze(Variable(l.cuda(GPU_ID)))
        b_x, b_l = sort_batch(b_x, b_l)

        output = model(b_x, b_l)              # rnn output
        loss = loss_func(output, b_y)  # cross entropy loss

        optimizer.zero_grad()          # clear gradients for this training step
        loss.backward()                # backpropagation, compute gradients
        optimizer.step()               # apply gradients

The output is :

Data Loading : 100% [==========================================================================================]   Elapsed Time: 0:01:15 | Time: 0:01:15
Train Set :  60 Categories, 33212 Samples in total
Data Loading : 100% [==========================================================================================]   Elapsed Time: 0:00:42 | Time: 0:00:42
Test Set :  60 Categories, 17885 Samples in total
('\n', RNN (
  (rnn): LSTM(75, 100, num_layers=3, batch_first=True)
  (out): Linear (100 -> 60)
))
LR is set to 0.01 
Epoch:  0 | train loss: 4.0995 | test accuracy: 0.0167  
Epoch:  0 | train loss: 4.1131 | test accuracy: 0.0165 
Epoch:  0 | train loss: 4.1193 | test accuracy: 0.0161
Epoch:  0 | train loss: 4.1162 | test accuracy: 0.0208
Epoch:  0 | train loss: 4.1005 | test accuracy: 0.0166
Epoch:  0 | train loss: 4.1043 | test accuracy: 0.0164
Epoch:  0 | train loss: 4.0987 | test accuracy: 0.0167
Epoch:  0 | train loss: 4.1065 | test accuracy: 0.0167
Epoch:  0 | train loss: 4.1033 | test accuracy: 0.0167

I am not sure where the problem is. Am I using the pack_padded_sequence and pad_packed_sequence in a wrong way? Is there any way that I can debug this model?

Besides, I want to use gradcheck to debug my code. However I don’t know how to use it in my case. Anyone can help?

Thank you so much.

Junwu_Weng · November 9, 2017, 2:33am

Anyone can help? Should I provide any other details?

simono · November 9, 2017, 11:07am

I have the same problem and I’m just working through this thread: [Solved] Multiple PackedSequence input ordering

Maybe that can help you as well

Junwu_Weng · November 9, 2017, 11:44am

I see where my problem is. I just sort the input sample and its related length. However I forgot to sort the label of sample. Now my problem solved.

Thank you so much !!