LSTM : basic principle. Many to many regression

Deal all,

In the context of many to many regression for finance forecasting, I was having trouble to setup my LSTM network : the model kept returning bad temporal predictions after a short learning phase (loss function reducing).I could mention that output of the LSTM was always the same with no temporal evolution.
So I have simplified the problem up to the most simple problem possibly imaginable.
Learning input sequence (X,Y)_t for t=0…9 is ([0],[0]), ([1],[1]), ([2],[2]), … ([9],[9]). Which means that I am trying to make the model learning the identity !

So I hope getting the output [0, 1, 2, … 9] from the after-learning-model if I input [0, 1, 2, … 9].

I have made the following code to declare my dataloader of tensors:

import pandas as pd
import numpy as np
import torch

batch_size = 1
train_loader = []
for d in range(10):
    tX = np.zeros((10,batch_size,1))
    tY = np.zeros((10,batch_size,1))
    for i in range(tX.shape[0]):
        for j in range(tX.shape[1]):
            for k in range(tX.shape[2]):
                tX[i][j][k] = i 
                tY[i][j][k] = i
    tX = torch.Tensor(tX).float()
    tY = torch.Tensor(tY).float()
    train_loader.append((tX,tY))

valid_loader = []
for d in range(1):
    vX = np.zeros((10,batch_size,1))
    vY = np.zeros((10,batch_size,1))
    for i in range(vX.shape[0]):
        for j in range(vX.shape[1]):
            for k in range(vX.shape[2]):
                vX[i][j][k] = i
                vY[i][j][k] = i 
    vX = torch.Tensor(vX).float()
    vY = torch.Tensor(vY).float()
    valid_loader.append((vX,vY))

Then the very basic model and learning process as follows:

EPS = 0.001
NB_EPOCH = 1000
K = 1
Xdim = 1
Ydim = 1


MyLoss = torch.nn.MSELoss()

model = torch.nn.LSTM(input_size = Xdim,
                      hidden_size = Ydim,
                      num_layers = 1
                      )
optim = torch.optim.SGD(params=model.parameters(),lr=EPS)
 
for epoch in range(NB_EPOCH):
    # Apprentissage
    model.train()
    for tX, tY in train_loader:
        optim.zero_grad()
        X1, _ = model(tX)
        loss = MyLoss(X1.view(-1,Ydim),tY.view(-1,Ydim))
        loss.backward()
        optim.step()

    if epoch % 100 ==0: 
        # Validation
        print('Epoch ===', epoch, end='', flush=True)
        model.eval()
        cumloss = 0
        with torch.no_grad():
            for vX, vY in valid_loader:
                X1, _ = model(vX)
                cumloss += MyLoss(X1.view(-1,Ydim),vY.view(-1,Ydim))
        print('cumloss = ', cumloss) 
    
# output example from last vX
model.eval()
Y, (h, c) = model(vX)
print('input = ',vX.permute(1,0,2))
print('output Y = ', Y.permute(1,0,2))

Result is this very strange output from the hidden layer at time 0, 1, 2, … 9:

output Y = tensor([[[0.2918], [0.7336], [0.9321], [0.9848], [0.9968], [0.9993], [0.9999], [1.0000], [1.0000], [1.0000]]], grad_fn=)

Complete output from the code:

Epoch === 0cumloss = tensor(24.1298)
Epoch === 100cumloss = tensor(22.2144)
Epoch === 200cumloss = tensor(20.5057)
Epoch === 300cumloss = tensor(20.4749)
Epoch === 400cumloss = tensor(20.4618)
Epoch === 500cumloss = tensor(20.4541)
Epoch === 600cumloss = tensor(20.4489)
Epoch === 700cumloss = tensor(20.4452)
Epoch === 800cumloss = tensor(20.4424)
Epoch === 900cumloss = tensor(20.4402)
input = tensor([[[0.],
[1.],
[2.],
[3.],
[4.],
[5.],
[6.],
[7.],
[8.],
[9.]]])
output Y = tensor([[[0.2918],
[0.7336],
[0.9321],
[0.9848],
[0.9968],
[0.9993],
[0.9999],
[1.0000],
[1.0000],
[1.0000]]], grad_fn=)

So am I making a basic codding mistake like not initializing something or anything else OR, am I fundamentally misunderstanding how LSTM should behave ?