Problems with how LSTM work

I am newbie in LSTM, and I want to clarify some concepts. I know that LSTM has a good effect on time series data, and I want to try to build an LSTM model. I will not use LSTMCell here, because there may be places I don’t understand about cells.
I would like to ask if Run1 and Run2 in the following code are equivalent:

data = torch.ones(4, 100, 5) #batch, time length, feature
lstm = torch.nn.LSTM(
    input_size=5, hidden_size=16, num_layers=1, 
    batch_first=True, bidirectional=False)

def init_variable(_lstm, batchsize):
    # batch_size x hidden_size
    h_ = torch.zeros(_lstm.num_layers*pow(2,_lstm.bidirectional), 
                     batchsize, 
                     _lstm.hidden_size).requires_grad_()
    c_ = torch.zeros(_lstm.num_layers*pow(2,_lstm.bidirectional), 
                     batchsize, 
                     _lstm.hidden_size).requires_grad_()
    return h_, c_

def Run1(data=data, lstm=lstm):
    (h, c) = init_variable(lstm, data.shape[0])
    out, (h, c) = lstm(data, (h, c))
    return h

def Run2(data=data, lstm=lstm):
    (h, c) = init_variable(lstm, data.shape[0])
    for i in range(data.shape[1]):
        out, (h, c) = lstm(data[:,i,:].unsqueeze(1), (h, c))
    return h

Thanks in advance!

Yes, that seems to be the case. Using your code I get a small expected abs. error caused by the limited floating point precision:

out1 = Run1()
out2 = Run2()

print((out1 - out2).abs().max())
# tensor(7.4506e-09, grad_fn=<MaxBackward1>)

Hi @ptrblck ,
Thanks for your reply.
In addition, I have tried to add more than 2 layers in the LSTM, and the results will be different. Is this because the state entering the second layer is not the same?

lstm = torch.nn.LSTM(
    input_size=5, hidden_size=16, num_layers=2, 
    batch_first=True, bidirectional=False)

a = Run1()
b = Run2()
print(torch.equal(a, b)) # -> False
print(torch.equal(a[0], b[0])) # -> True
print(torch.equal(a[1], b[1])) # -> False

Don’t use torch.equal as floating point numbers will create small errors due to their limited precision.
I still get the same outputs for num_layers=2:

out1, h1 = Run1()
out2, h2 = Run2()

print((out1[:, -1] - out2).abs().max())
# tensor(2.9802e-08, grad_fn=<MaxBackward1>)

print((h1 - h2).abs().max())
# tensor(2.9802e-08, grad_fn=<MaxBackward1>)
1 Like

Dear @ptrblck I would like to request you to please visit my question.