Hi!
This code works fine and BiRNN converges:

``````def forward(self, x):
# Set initial states
h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()  # 2 for bidirection
c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()
# Forward propagate RNN
out, _ = self.lstm(x, (h0, c0))
# Decode hidden state of last time step
#out = out.view(-1, sequence_length, hidden_size*2)
out = self.fc(out[:, -1, :])
return out
``````

Result:

Epoch [1/2], Step [100/600], Loss: 0.6213
Epoch [1/2], Step [200/600], Loss: 0.2935
Epoch [1/2], Step [300/600], Loss: 0.2289
Epoch [1/2], Step [400/600], Loss: 0.1926
Epoch [1/2], Step [500/600], Loss: 0.0635
Epoch [1/2], Step [600/600], Loss: 0.0311
Epoch [2/2], Step [100/600], Loss: 0.1164
Epoch [2/2], Step [200/600], Loss: 0.0957
Epoch [2/2], Step [300/600], Loss: 0.1021
Epoch [2/2], Step [400/600], Loss: 0.0675
Epoch [2/2], Step [500/600], Loss: 0.1220
Epoch [2/2], Step [600/600], Loss: 0.0311
Test Accuracy of the model on the 10000 test images: 97 %

But then when I use `pack_padded_sequence` loss is stuck:

``````def forward(self, x):
# Set initial states
h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()  # 2 for bidirection
c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda()
# Forward propagate RNN
out, _ = self.lstm(pack, (h0, c0))
# Decode hidden state of last time step
out = out.view(-1, sequence_length, hidden_size*2)
out = self.fc(out[:, -1, :])
return out
``````

Result:

Epoch [1/2], Step [100/600], Loss: 2.3037
Epoch [1/2], Step [200/600], Loss: 2.3045
Epoch [1/2], Step [300/600], Loss: 2.3089
Epoch [1/2], Step [400/600], Loss: 2.2948
Epoch [1/2], Step [500/600], Loss: 2.2972
Epoch [1/2], Step [600/600], Loss: 2.3111
Epoch [2/2], Step [100/600], Loss: 2.3000
Epoch [2/2], Step [200/600], Loss: 2.2944
Epoch [2/2], Step [300/600], Loss: 2.2878
Epoch [2/2], Step [400/600], Loss: 2.2956
Epoch [2/2], Step [500/600], Loss: 2.2993
Epoch [2/2], Step [600/600], Loss: 2.3057
Test Accuracy of the model on the 10000 test images: 11 %

Why does it happen? As I understand it should work.
Using the pack_padded_sequence does not have any reason here because I use images which have fixed sequence size, but I learn Pytorch and try different things

Thanks!

I think you unpacked the sequence incorrectly. Use `pad_packed_sequence` with the RNN output.

1 Like

Why is this commented here & not in the next snippet?

As apaszke correctly mentioned, the error was in `out = out.view(-1, sequence_length, hidden_size*2)` so I should have used `pad_packed_sequence`