Hi,

I’m implementing Seq2Seq decoder and the paper says that LSTMCell should be used in a loop for each decoder step. However, I’m not convinced why LSTM itself could not be used. I’d suppose these two procedures should yield the same sequence of hidden states when looped over incrementally. Here’s the code which I thought would give the same results, but it didn’t.

I thought that the source of differences could be because some non-determinism in LSTM layer as is stated in documentation so I also set all possible random seeds, but looks like it’s not the problem.

```
import random
import torch
import numpy as np
from torch import nn
random.seed(0)
torch.manual_seed(100)
np.random.seed(0)
torch.use_deterministic_algorithms(True)
lstm = nn.LSTM(input_size=5, hidden_size=5, num_layers=1)
lstm_cell = nn.LSTMCell(5, 5)
h0 = torch.Tensor([0.1, 0.2, 0.12, -0.3, 0.1]).unsqueeze(0)
c0 = torch.Tensor([0.1, 0.2, 0.12, -0.3, 0.1]).unsqueeze(0)
x = torch.randn(8, 5)
with torch.no_grad():
h, c = h0, c0
for i in range(8):
_, (h, c) = lstm(x[i].unsqueeze(0), (h, c))
print(f"{h=} {c=}")
print("\n")
h, c = h0, c0
for i in range(8):
h, c = lstm_cell(x[i].unsqueeze(0), (h, c))
print(f"{h=} {c=}")
```