Hi!

This code works fine and BiRNN converges:

`def forward(self, x): # Set initial states h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # 2 for bidirection c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # Forward propagate RNN #pack = torch.nn.utils.rnn.pack_padded_sequence(x, batch_size*[28], batch_first=True) out, _ = self.lstm(x, (h0, c0)) # Decode hidden state of last time step #out = out[0].view(-1, sequence_length, hidden_size*2) out = self.fc(out[:, -1, :]) return out`

Result:

Epoch [1/2], Step [100/600], Loss: 0.6213

Epoch [1/2], Step [200/600], Loss: 0.2935

Epoch [1/2], Step [300/600], Loss: 0.2289

Epoch [1/2], Step [400/600], Loss: 0.1926

Epoch [1/2], Step [500/600], Loss: 0.0635

Epoch [1/2], Step [600/600], Loss: 0.0311

Epoch [2/2], Step [100/600], Loss: 0.1164

Epoch [2/2], Step [200/600], Loss: 0.0957

Epoch [2/2], Step [300/600], Loss: 0.1021

Epoch [2/2], Step [400/600], Loss: 0.0675

Epoch [2/2], Step [500/600], Loss: 0.1220

Epoch [2/2], Step [600/600], Loss: 0.0311

Test Accuracy of the model on the 10000 test images: 97 %

But then when I use `pack_padded_sequence`

loss is stuck:

`def forward(self, x): # Set initial states h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # 2 for bidirection c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # Forward propagate RNN pack = torch.nn.utils.rnn.pack_padded_sequence(x, batch_size*[28], batch_first=True) out, _ = self.lstm(pack, (h0, c0)) # Decode hidden state of last time step out = out[0].view(-1, sequence_length, hidden_size*2) out = self.fc(out[:, -1, :]) return out`

Result:

Epoch [1/2], Step [100/600], Loss: 2.3037

Epoch [1/2], Step [200/600], Loss: 2.3045

Epoch [1/2], Step [300/600], Loss: 2.3089

Epoch [1/2], Step [400/600], Loss: 2.2948

Epoch [1/2], Step [500/600], Loss: 2.2972

Epoch [1/2], Step [600/600], Loss: 2.3111

Epoch [2/2], Step [100/600], Loss: 2.3000

Epoch [2/2], Step [200/600], Loss: 2.2944

Epoch [2/2], Step [300/600], Loss: 2.2878

Epoch [2/2], Step [400/600], Loss: 2.2956

Epoch [2/2], Step [500/600], Loss: 2.2993

Epoch [2/2], Step [600/600], Loss: 2.3057

Test Accuracy of the model on the 10000 test images: 11 %

Why does it happen? As I understand it should work.

Using the pack_padded_sequence does not have any reason here because I use images which have fixed sequence size, but I learn Pytorch and try different things

Thanks!