Deal all,
In the context of many to many regression for finance forecasting, I was having trouble to setup my LSTM network : the model kept returning bad temporal predictions after a short learning phase (loss function reducing).I could mention that output of the LSTM was always the same with no temporal evolution.
So I have simplified the problem up to the most simple problem possibly imaginable.
Learning input sequence (X,Y)_t for t=0…9 is ([0],[0]), ([1],[1]), ([2],[2]), … ([9],[9]). Which means that I am trying to make the model learning the identity !
So I hope getting the output [0, 1, 2, … 9] from the after-learning-model if I input [0, 1, 2, … 9].
I have made the following code to declare my dataloader of tensors:
import pandas as pd
import numpy as np
import torch
batch_size = 1
train_loader = []
for d in range(10):
tX = np.zeros((10,batch_size,1))
tY = np.zeros((10,batch_size,1))
for i in range(tX.shape[0]):
for j in range(tX.shape[1]):
for k in range(tX.shape[2]):
tX[i][j][k] = i
tY[i][j][k] = i
tX = torch.Tensor(tX).float()
tY = torch.Tensor(tY).float()
train_loader.append((tX,tY))
valid_loader = []
for d in range(1):
vX = np.zeros((10,batch_size,1))
vY = np.zeros((10,batch_size,1))
for i in range(vX.shape[0]):
for j in range(vX.shape[1]):
for k in range(vX.shape[2]):
vX[i][j][k] = i
vY[i][j][k] = i
vX = torch.Tensor(vX).float()
vY = torch.Tensor(vY).float()
valid_loader.append((vX,vY))
Then the very basic model and learning process as follows:
EPS = 0.001
NB_EPOCH = 1000
K = 1
Xdim = 1
Ydim = 1
MyLoss = torch.nn.MSELoss()
model = torch.nn.LSTM(input_size = Xdim,
hidden_size = Ydim,
num_layers = 1
)
optim = torch.optim.SGD(params=model.parameters(),lr=EPS)
for epoch in range(NB_EPOCH):
# Apprentissage
model.train()
for tX, tY in train_loader:
optim.zero_grad()
X1, _ = model(tX)
loss = MyLoss(X1.view(-1,Ydim),tY.view(-1,Ydim))
loss.backward()
optim.step()
if epoch % 100 ==0:
# Validation
print('Epoch ===', epoch, end='', flush=True)
model.eval()
cumloss = 0
with torch.no_grad():
for vX, vY in valid_loader:
X1, _ = model(vX)
cumloss += MyLoss(X1.view(-1,Ydim),vY.view(-1,Ydim))
print('cumloss = ', cumloss)
# output example from last vX
model.eval()
Y, (h, c) = model(vX)
print('input = ',vX.permute(1,0,2))
print('output Y = ', Y.permute(1,0,2))
Result is this very strange output from the hidden layer at time 0, 1, 2, … 9:
output Y = tensor([[[0.2918], [0.7336], [0.9321], [0.9848], [0.9968], [0.9993], [0.9999], [1.0000], [1.0000], [1.0000]]], grad_fn=)
Complete output from the code:
Epoch === 0cumloss = tensor(24.1298)
Epoch === 100cumloss = tensor(22.2144)
Epoch === 200cumloss = tensor(20.5057)
Epoch === 300cumloss = tensor(20.4749)
Epoch === 400cumloss = tensor(20.4618)
Epoch === 500cumloss = tensor(20.4541)
Epoch === 600cumloss = tensor(20.4489)
Epoch === 700cumloss = tensor(20.4452)
Epoch === 800cumloss = tensor(20.4424)
Epoch === 900cumloss = tensor(20.4402)
input = tensor([[[0.],
[1.],
[2.],
[3.],
[4.],
[5.],
[6.],
[7.],
[8.],
[9.]]])
output Y = tensor([[[0.2918],
[0.7336],
[0.9321],
[0.9848],
[0.9968],
[0.9993],
[0.9999],
[1.0000],
[1.0000],
[1.0000]]], grad_fn=)
So am I making a basic codding mistake like not initializing something or anything else OR, am I fundamentally misunderstanding how LSTM should behave ?