I am now trying to train a 3-layer LSTM model with sequence samples of various length. The tools that I use are ** pack_padded_sequence** and

**. I refer the examples in (About the variable length input in RNN scenario) to implement my own version. However I met a similar situation as posted in the link. My network does not converge. My code is shown below:**

*pad_packed_sequence***The code used for batch sorting :** (to make sure the samples in the batch are sorted based on length, in descending order )

```
def sort_batch(x, length):
batch_size = x.size(0) # get size of batch
sorted_length, sorted_idx = length.sort() # sort the length of sequence samples
reverse_idx = torch.linspace(batch_size-1,0,batch_size).long()
reverse_idx = reverse_idx.cuda(GPU_ID)
sorted_length = sorted_length[reverse_idx] # for descending order
sorted_idx = sorted_idx[reverse_idx]
sorted_data = x[sorted_idx] # sorted in descending order
return sorted_data, sorted_length
```

**forward function:**

```
def forward(self, x, l):
x = pack_padded_sequence(x, list(l.data), batch_first = True) # pack batch
r_out, (h_n, h_c) = self.rnn(x, None) # pass in LSTM model
r_out, _ = pad_packed_sequence(r_out, batch_first = True) # unpack batch
idx = (l-1).view(-1,1).expand(r_out.size(0), r_out.size(2)).unsqueeze(1).long()
r_out = r_out.gather(1, idx).squeeze().unsqueeze(1) # get last hidden output of each sequence
out = self.out(r_out[:, -1, :])
return out
```

**training :**

```
for step, (x, y, l) in enumerate(dset_loaders['Train']): # gives batch data
b_x = Variable(x.cuda(GPU_ID).view(-1, TIME_STEP, INPUT_SIZE)) # reshape x to (batch, time_step, input_size)
b_y = torch.squeeze(Variable(y.cuda(GPU_ID))) # batch y
b_l = torch.squeeze(Variable(l.cuda(GPU_ID)))
b_x, b_l = sort_batch(b_x, b_l)
output = model(b_x, b_l) # rnn output
loss = loss_func(output, b_y) # cross entropy loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
```

**The output is :**

```
Data Loading : 100% [==========================================================================================] Elapsed Time: 0:01:15 | Time: 0:01:15
Train Set : 60 Categories, 33212 Samples in total
Data Loading : 100% [==========================================================================================] Elapsed Time: 0:00:42 | Time: 0:00:42
Test Set : 60 Categories, 17885 Samples in total
('\n', RNN (
(rnn): LSTM(75, 100, num_layers=3, batch_first=True)
(out): Linear (100 -> 60)
))
LR is set to 0.01
Epoch: 0 | train loss: 4.0995 | test accuracy: 0.0167
Epoch: 0 | train loss: 4.1131 | test accuracy: 0.0165
Epoch: 0 | train loss: 4.1193 | test accuracy: 0.0161
Epoch: 0 | train loss: 4.1162 | test accuracy: 0.0208
Epoch: 0 | train loss: 4.1005 | test accuracy: 0.0166
Epoch: 0 | train loss: 4.1043 | test accuracy: 0.0164
Epoch: 0 | train loss: 4.0987 | test accuracy: 0.0167
Epoch: 0 | train loss: 4.1065 | test accuracy: 0.0167
Epoch: 0 | train loss: 4.1033 | test accuracy: 0.0167
```

I am not sure where the problem is. Am I using the ** pack_padded_sequence** and

**in a wrong way? Is there any way that I can debug this model?**

*pad_packed_sequence*Besides, I want to use ** gradcheck** to debug my code. However I don’t know how to use it in my case. Anyone can help?

Thank you so much.