Hi!
This code works fine and BiRNN converges:
def forward(self, x): # Set initial states h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # 2 for bidirection c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # Forward propagate RNN #pack = torch.nn.utils.rnn.pack_padded_sequence(x, batch_size*[28], batch_first=True) out, _ = self.lstm(x, (h0, c0)) # Decode hidden state of last time step #out = out[0].view(-1, sequence_length, hidden_size*2) out = self.fc(out[:, -1, :]) return out
Result:
Epoch [1/2], Step [100/600], Loss: 0.6213
Epoch [1/2], Step [200/600], Loss: 0.2935
Epoch [1/2], Step [300/600], Loss: 0.2289
Epoch [1/2], Step [400/600], Loss: 0.1926
Epoch [1/2], Step [500/600], Loss: 0.0635
Epoch [1/2], Step [600/600], Loss: 0.0311
Epoch [2/2], Step [100/600], Loss: 0.1164
Epoch [2/2], Step [200/600], Loss: 0.0957
Epoch [2/2], Step [300/600], Loss: 0.1021
Epoch [2/2], Step [400/600], Loss: 0.0675
Epoch [2/2], Step [500/600], Loss: 0.1220
Epoch [2/2], Step [600/600], Loss: 0.0311
Test Accuracy of the model on the 10000 test images: 97 %
But then when I use pack_padded_sequence
loss is stuck:
def forward(self, x): # Set initial states h0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # 2 for bidirection c0 = Variable(torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size)).cuda() # Forward propagate RNN pack = torch.nn.utils.rnn.pack_padded_sequence(x, batch_size*[28], batch_first=True) out, _ = self.lstm(pack, (h0, c0)) # Decode hidden state of last time step out = out[0].view(-1, sequence_length, hidden_size*2) out = self.fc(out[:, -1, :]) return out
Result:
Epoch [1/2], Step [100/600], Loss: 2.3037
Epoch [1/2], Step [200/600], Loss: 2.3045
Epoch [1/2], Step [300/600], Loss: 2.3089
Epoch [1/2], Step [400/600], Loss: 2.2948
Epoch [1/2], Step [500/600], Loss: 2.2972
Epoch [1/2], Step [600/600], Loss: 2.3111
Epoch [2/2], Step [100/600], Loss: 2.3000
Epoch [2/2], Step [200/600], Loss: 2.2944
Epoch [2/2], Step [300/600], Loss: 2.2878
Epoch [2/2], Step [400/600], Loss: 2.2956
Epoch [2/2], Step [500/600], Loss: 2.2993
Epoch [2/2], Step [600/600], Loss: 2.3057
Test Accuracy of the model on the 10000 test images: 11 %
Why does it happen? As I understand it should work.
Using the pack_padded_sequence does not have any reason here because I use images which have fixed sequence size, but I learn Pytorch and try different things
Thanks!