I’m experimenting with LSTMs and tried to overfit a single example. However, I’m unable to do that. This seems very surprising. Can someone tell me if I’m going wrong somewhere?

Here’s the code

```
import torch
import torch.nn as nn
lstm = nn.LSTM(1, 512).cuda()
linear = nn.Linear(512,1).cuda()
in_tensor = torch.Tensor([[1,2,3]]).reshape(3,1,1).cuda()
output_tensor = torch.Tensor([[2,5,23]]).reshape(3,1,1).cuda()
optimizer = torch.optim.Adam(lstm.parameters(), lr=2e-3)
loss_fn = nn.MSELoss()
for i in range(100000):
pred_out = linear(lstm(in_tensor)[0])
optimizer.zero_grad()
loss = loss_fn(pred_out, output_tensor)
if i % 1000 == 0:
print(i, loss)
loss.backward()
optimizer.step()
```

The loss saturates at ~100 and model predictions won’t change. May be it got stuck at a local minima, but feels extremely surprising that this is happening.