I am new to Pytorch and trying to understand the LSTM implementation. So I create an input with unit length and pass it through a) LSTMCell and b) LSTM
My understanding is LSTM is basically stacking LSTMCell in an efficient manner, so if I manually provide both the same input (all other parameters like hidden size etc. remaining the same) the output should be exactly the same for both. I have inserted by standalone code below and I am getting different results. Can someone please point to what the issue is?
""" Comparison LSTMCell and LSTM """
import numpy as np
import torch
# get SAME input for both cases.
inpsize=1
batch=1
hiddensize=5
seq_len=1
nlayers=1
torch.manual_seed(999)
x_cell= torch.randn(seq_len,batch,inpsize)
h0_cell= torch.randn(batch,hiddensize)
c0_cell= torch.randn(batch,hiddensize)
x_layer = x_cell.clone()
h0_layer = h0_cell.clone()
c0_layer = c0_cell.clone()
h0_layer = h0_layer.view(nlayers,batch,hiddensize)
c0_layer = c0_layer.view(nlayers,batch,hiddensize)
# LSTM Cell Stacked into layers
lstm= torch.nn.LSTMCell(inpsize,hiddensize)
out_cell=[]
states_cell=[]
ht_cell, ct_cell = h0_cell.clone() , c0_cell.clone()
for i in range(seq_len):
ht_cell,ct_cell = lstm(x_cell[i],(ht_cell,ct_cell))
out_cell.append(ht_cell)
states_cell.append(ct_cell)
print('output cell is', out_cell)
print('states cell is', states_cell)
# full LSTM
full_lstm = torch.nn.LSTM(inpsize, hiddensize, nlayers)
out_layer, (ht_layer,ct_layer) = full_lstm(x_layer, (h0_layer,c0_layer))
print('Layer Output is', out_layer)
print('ht layer Output is', ht_layer)
print('ct layer Output is', ct_layer)
# print('Input is', x_cell)
# print('Cell Output is', out_cell)
# print('Layer Output is', out_layer)
The output reads:
Done
output cell is [tensor([[-0.1558, 0.1880, -0.3255, -0.0062, 0.0196]], grad_fn=<ThMulBackward>)]
state cell is [tensor([[-0.3916, 0.4230, -0.7570, -0.0181, 0.0513]], grad_fn=<ThAddBackward>)]
Layer Output is tensor([[[-0.0504, 0.2765, -0.1421, 0.1251, 0.0082]]], grad_fn=<CatBackward>)
ht layer Output is tensor([[[-0.0504, 0.2765, -0.1421, 0.1251, 0.0082]]], grad_fn=<ViewBackward>)
ct layer Output is tensor([[[-0.0801, 0.4966, -0.3166, 0.2123, 0.0124]]], grad_fn=<ViewBackward>)
If I am using 1 layer of seq_len= 1, exactly the same inputs, shouldn’t LSTM be the same as a LSTMCell? I am confused.
Thanks for your time.