LSTMcell and LSTM returning different outputs

arvindmohan · October 7, 2018, 10:15pm

I am new to Pytorch and trying to understand the LSTM implementation. So I create an input with unit length and pass it through a) LSTMCell and b) LSTM

My understanding is LSTM is basically stacking LSTMCell in an efficient manner, so if I manually provide both the same input (all other parameters like hidden size etc. remaining the same) the output should be exactly the same for both. I have inserted by standalone code below and I am getting different results. Can someone please point to what the issue is?

""" Comparison LSTMCell and LSTM """

import numpy as np
import torch

# get SAME input for both cases.
inpsize=1
batch=1
hiddensize=5
seq_len=1
nlayers=1
torch.manual_seed(999)

x_cell= torch.randn(seq_len,batch,inpsize)
h0_cell= torch.randn(batch,hiddensize)
c0_cell= torch.randn(batch,hiddensize)

x_layer = x_cell.clone()
h0_layer = h0_cell.clone()
c0_layer = c0_cell.clone()
h0_layer = h0_layer.view(nlayers,batch,hiddensize)
c0_layer = c0_layer.view(nlayers,batch,hiddensize)

# LSTM Cell Stacked into layers
lstm= torch.nn.LSTMCell(inpsize,hiddensize)
out_cell=[]
states_cell=[]
ht_cell, ct_cell = h0_cell.clone() , c0_cell.clone()
for i in range(seq_len):
    ht_cell,ct_cell = lstm(x_cell[i],(ht_cell,ct_cell))
    out_cell.append(ht_cell)
    states_cell.append(ct_cell)
print('output cell is', out_cell)
print('states cell is', states_cell)

# full LSTM
full_lstm = torch.nn.LSTM(inpsize, hiddensize, nlayers)
out_layer, (ht_layer,ct_layer) = full_lstm(x_layer, (h0_layer,c0_layer))
print('Layer Output is', out_layer)
print('ht layer Output is', ht_layer)
print('ct layer Output is', ct_layer)

# print('Input is', x_cell)
# print('Cell Output is', out_cell)
# print('Layer Output is', out_layer)

The output reads:

Done
output cell is [tensor([[-0.1558,  0.1880, -0.3255, -0.0062,  0.0196]], grad_fn=<ThMulBackward>)]
state cell is [tensor([[-0.3916,  0.4230, -0.7570, -0.0181,  0.0513]], grad_fn=<ThAddBackward>)]
Layer Output is tensor([[[-0.0504,  0.2765, -0.1421,  0.1251,  0.0082]]], grad_fn=<CatBackward>)
ht layer Output is tensor([[[-0.0504,  0.2765, -0.1421,  0.1251,  0.0082]]], grad_fn=<ViewBackward>)
ct layer Output is tensor([[[-0.0801,  0.4966, -0.3166,  0.2123,  0.0124]]], grad_fn=<ViewBackward>)

If I am using 1 layer of seq_len= 1, exactly the same inputs, shouldn’t LSTM be the same as a LSTMCell? I am confused.
Thanks for your time.

InnovArul · October 8, 2018, 4:09am

In your code, the weight initialization of both LSTMCell, LSTM are not same.
Initialize seed manually before creating the instances.

torch.manual_seed(999)
lstm= torch.nn.LSTMCell(inpsize,hiddensize)

torch.manual_seed(999)
full_lstm = torch.nn.LSTM(inpsize, hiddensize, nlayers)

arvindmohan · October 8, 2018, 5:07am

Arul - That worked, thanks much. But since I had already set the seed at the top, I was hoping that it gets reflected throughout the code - looks like it isn’t the case.

jlquinn · October 10, 2018, 6:46pm

The reason is that the random number generator changes its state with each number generated. You can think of it as creating a very long list of pseudorandom numbers when the seed is set, then successive calls return the next number in the sequence.

Without resetting the seed when you initialize the second LSTM, you’re choosing a different starting place in the sequence, so the numbers will be different.

arvindmohan · October 10, 2018, 9:51pm

I see - thanks for the clarification.