Why the option `batch_first` in LSTM causes different result in this case?

Hello.

I made a simple code to understand class torch.nn.LSTM in PyTorch.
I changed input’s axis from (seq_len, batch, input_size) to (batch, seq_len, input_size) when I use batch_first option.
However, I could not understand why I get different result with batch_first option.

Here is my code.

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

lstm = nn.LSTM(input_size=3, hidden_size=3, num_layers=1) 
lstm2 = nn.LSTM(input_size=3, hidden_size=3, num_layers=1, batch_first=True) 

inputs = autograd.Variable(torch.randn((30)))
h0 = autograd.Variable(torch.randn(1, 2, 3))
c0 = autograd.Variable(torch.randn((1, 2, 3)))

inputs1 = inputs.view(5, 2, -1).contiguous()
inputs2 = torch.transpose(inputs1, 0, 1).contiguous()

out = lstm(inputs1, (h0, c0))[0]
print("Case 1")
print(torch.transpose(inputs1, 0, 1).contiguous())
print(torch.transpose(out, 0, 1).contiguous())

print("#######"*5)

out = lstm2(inputs2, (h0, c0))[0]
print("Case 2")
print(inputs2)
print(out)

And the result below.

Case 1
Variable containing:
(0 ,.,.) = 
  1.4114 -0.9804 -0.7578
 -0.4270 -0.3868 -0.6089
  1.1848 -1.0322 -0.7039
 -0.8018 -0.7855  0.7877
 -0.4594 -1.1798  0.3812

(1 ,.,.) = 
 -0.3782  1.7211  0.0310
  1.1652 -0.1326 -0.0228
  0.8813  1.4276 -0.9245
  0.0786  1.7053 -0.8098
 -0.0064  0.5302  0.9990
[torch.FloatTensor of size 2x5x3]

Variable containing:
(0 ,.,.) = 
  0.0122 -0.0571 -0.0294
  0.0310 -0.0025  0.2622
 -0.0092 -0.1369  0.1579
 -0.0573 -0.0152  0.2817
 -0.1080  0.0052  0.2817

(1 ,.,.) = 
  0.0885  0.0426  0.3910
  0.0516 -0.0430  0.2333
  0.0789  0.0719  0.1446
  0.1162  0.1515  0.2469
  0.1018  0.0987  0.3232
[torch.FloatTensor of size 2x5x3]

###################################
Case 2
Variable containing:
(0 ,.,.) = 
  1.4114 -0.9804 -0.7578
 -0.4270 -0.3868 -0.6089
  1.1848 -1.0322 -0.7039
 -0.8018 -0.7855  0.7877
 -0.4594 -1.1798  0.3812

(1 ,.,.) = 
 -0.3782  1.7211  0.0310
  1.1652 -0.1326 -0.0228
  0.8813  1.4276 -0.9245
  0.0786  1.7053 -0.8098
 -0.0064  0.5302  0.9990
[torch.FloatTensor of size 2x5x3]

Variable containing:
(0 ,.,.) = 
  0.1568 -0.2322  0.0824
  0.1147 -0.0394  0.1534
  0.0238 -0.1611 -0.0544
  0.0642 -0.0818  0.0314
  0.0626 -0.1119  0.0638

(1 ,.,.) = 
 -0.0989  0.0636 -0.1731
 -0.0746 -0.0633 -0.3005
  0.0611  0.0328 -0.3500
  0.1299  0.1404 -0.1330
  0.1082 -0.0088 -0.0390
[torch.FloatTensor of size 2x5x3]

I really want to understand this.
Thanks in advance.

Oh. I solved this just by adding torch.manual_seed(1) before defining each LSTMs.
Now I understand that batch_first works as I expected.

1 Like