Why can't I overfit a single example in LSTM?

I’m experimenting with LSTMs and tried to overfit a single example. However, I’m unable to do that. This seems very surprising. Can someone tell me if I’m going wrong somewhere?

Here’s the code

import torch
import torch.nn as nn

lstm = nn.LSTM(1, 512).cuda()
linear = nn.Linear(512,1).cuda()

in_tensor = torch.Tensor([[1,2,3]]).reshape(3,1,1).cuda()
output_tensor = torch.Tensor([[2,5,23]]).reshape(3,1,1).cuda()

optimizer = torch.optim.Adam(lstm.parameters(), lr=2e-3)
loss_fn = nn.MSELoss()

for i in range(100000):
    pred_out = linear(lstm(in_tensor)[0])
    optimizer.zero_grad()
    loss = loss_fn(pred_out, output_tensor)
    if i % 1000 == 0:   
        print(i, loss)
    loss.backward()
    optimizer.step()

The loss saturates at ~100 and model predictions won’t change. May be it got stuck at a local minima, but feels extremely surprising that this is happening.

It is because you are only optimizing the lstm not the linear layer. Because of this the linear layer is not learning so your model cannot overfit. To optimize both you can do this

params = list(linear.parameters()) + list(lstm.parameters())
optimizer = torch.optim.Adam(params, lr=2e-3)

I tested that and my model overfit the data.