Question about pack_padded_sequence

Hi!
This code works fine and BiRNN converges:

def forward(self, x):
    # Set initial states
    h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()  # 2 for bidirection
    c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()
    # Forward propagate RNN
    #pack = torch.nn.utils.rnn.pack_padded_sequence(x, batch_size*[28], batch_first=True)
    out, _ = self.lstm(x, (h0, c0))
    # Decode hidden state of last time step
    #out = out[0].view(-1, sequence_length, hidden_size*2)
    out = self.fc(out[:, -1, :])
    return out        

Result:

Epoch [1/2], Step [100/600], Loss: 0.6213
Epoch [1/2], Step [200/600], Loss: 0.2935
Epoch [1/2], Step [300/600], Loss: 0.2289
Epoch [1/2], Step [400/600], Loss: 0.1926
Epoch [1/2], Step [500/600], Loss: 0.0635
Epoch [1/2], Step [600/600], Loss: 0.0311
Epoch [2/2], Step [100/600], Loss: 0.1164
Epoch [2/2], Step [200/600], Loss: 0.0957
Epoch [2/2], Step [300/600], Loss: 0.1021
Epoch [2/2], Step [400/600], Loss: 0.0675
Epoch [2/2], Step [500/600], Loss: 0.1220
Epoch [2/2], Step [600/600], Loss: 0.0311
Test Accuracy of the model on the 10000 test images: 97 %

But then when I use pack_padded_sequence loss is stuck:

def forward(self, x):
    # Set initial states
    h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()  # 2 for bidirection
    c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()
    # Forward propagate RNN
    pack = torch.nn.utils.rnn.pack_padded_sequence(x, batch_size*[28], batch_first=True)
    out, _ = self.lstm(pack, (h0, c0))
    # Decode hidden state of last time step
    out = out[0].view(-1, sequence_length, hidden_size*2)
    out = self.fc(out[:, -1, :])
    return out

Result:

Epoch [1/2], Step [100/600], Loss: 2.3037
Epoch [1/2], Step [200/600], Loss: 2.3045
Epoch [1/2], Step [300/600], Loss: 2.3089
Epoch [1/2], Step [400/600], Loss: 2.2948
Epoch [1/2], Step [500/600], Loss: 2.2972
Epoch [1/2], Step [600/600], Loss: 2.3111
Epoch [2/2], Step [100/600], Loss: 2.3000
Epoch [2/2], Step [200/600], Loss: 2.2944
Epoch [2/2], Step [300/600], Loss: 2.2878
Epoch [2/2], Step [400/600], Loss: 2.2956
Epoch [2/2], Step [500/600], Loss: 2.2993
Epoch [2/2], Step [600/600], Loss: 2.3057
Test Accuracy of the model on the 10000 test images: 11 %

Why does it happen? As I understand it should work.
Using the pack_padded_sequence does not have any reason here because I use images which have fixed sequence size, but I learn Pytorch and try different things

Thanks!

I think you unpacked the sequence incorrectly. Use pad_packed_sequence with the RNN output.

1 Like

Why is this commented here & not in the next snippet?

As apaszke correctly mentioned, the error was in out = out[0].view(-1, sequence_length, hidden_size*2) so I should have used pad_packed_sequence