Hello guys! I’m having a bit of a problem trying to implement a small LSTM network.
I’m using a sequence of 20 values as input and the network has to predict certain output. The data is scaled between 0 and 1, and the dataset looks like this:
Input:
array([[0.3616897 , 0.50186179, 0.46220047, 0.48337192],
[0.38939199, 0.5308964 , 0.47071214, 0.48807264],
[0.43114892, 0.55613415, 0.47903991, 0.49106299],
...,
[0.48847856, 0.55368452, 0.48759646, 0.48916795],
[0.49450675, 0.57330196, 0.48922357, 0.49509893],
[0.49728463, 0.58997826, 0.49048734, 0.50129733]])
Output:
array([[0.33857308, 0.50931249],
[0.3834156 , 0.53883397],
[0.42320043, 0.5688971 ],
...,
[0.48872479, 0.56165727],
[0.50050588, 0.58804321],
[0.49346006, 0.59605735]])
So I would use the 20 first values of Input
as the first sequence and the desired output would be the 20th element of Output
.
This is how my network class looks like:
class neuralNet(nn.Module):
def __init__(self):
super(neuralNet, self).__init__()
self.lstm = nn.LSTM(4, 4)
self.fc1 = nn.Linear(4, 16)
self.out_real = nn.Linear(16, 1)
self.out_im = nn.Linear(16, 1)
def forward(self, X):
x, _ = self.lstm(X)
x = F.leaky_relu(self.fc1(x[-1].view(X.shape[1], -1)))
sal_real = F.leay_relu(self.out_real(x))
sal_im = F.leaky_relu(self.out_im(x))
return sal_real, sal_im
This is how I define the Dataloader:
class _data_(Dataset):
def __init__(self, X, y, window):
self.X = X
self.y = y
self.window = window
def __len__(self):
return self.X.shape[0]
def __getitem__(self, idx):
return self.X[idx:idx+(self.window)], self.y[idx+self.window-1]
I use the Adam optimizer, a batch size of 1024, however I’ve tried with several different batch sizes and learning rates, and this is how my training loop looks like:
network.train()
train_loss = []
for epochs in range(100):
for x, y_true in loader:
optimizer.zero_grad()
out_real, out_im = network(x.view(20, 1024, 4).type(torch.FloatTensor))
loss_1 = torch.sqrt(((out_real-y_true[:, 0].view(-1, 1).type(torch.FloatTensor))**2).mean())
loss_2 = torch.sqrt(((out_im-y_true[:, 1].view(-1, 1).type(torch.FloatTensor))**2).mean())
loss = loss_1+loss_2
loss.backward()
optimizer.step()
train_loss.append(loss.detach().numpy().reshape(-1))
print(epochs)
However, the loss quickly drops to around 0.23
and then just stays there, predicting very similar values no matter the input. I checked the gradients of the LSTM layer and the gradients are very small:
print(network.lstm.weight_ih_l0.grad)
tensor([[ 2.4166e-05, 1.1411e-04, 6.1285e-05, 8.6557e-05],
[ 5.0125e-05, -1.9328e-05, 1.7648e-05, 2.4593e-06],
[ 6.6281e-05, 6.8287e-05, 6.3350e-05, 6.7243e-05],
[ 2.7249e-05, 5.8169e-05, 4.2136e-05, 5.0256e-05],
[ 2.1470e-05, 1.0980e-04, 5.8128e-05, 8.3109e-05],
[ 7.5999e-05, -7.7817e-06, 3.6380e-05, 1.7472e-05],
[ 1.1556e-04, 1.3376e-04, 1.1724e-04, 1.2893e-04],
[ 3.5581e-05, 6.9475e-05, 4.9838e-05, 5.9006e-05],
[-1.0905e-04, -4.3272e-04, -2.4602e-04, -3.3566e-04],
[ 4.4844e-04, 2.4696e-06, 2.3953e-04, 1.3694e-04],
[ 1.7014e-04, 1.8785e-04, 1.6552e-04, 1.8307e-04],
[-2.9961e-04, -1.2150e-03, -6.4513e-04, -9.0194e-04],
[ 2.6096e-05, 2.2836e-04, 1.0026e-04, 1.6018e-04],
[ 1.4060e-04, -6.1505e-05, 3.9611e-05, -1.6231e-05],
[ 2.9458e-04, 1.2405e-04, 1.8125e-04, 1.3770e-04],
[ 4.3766e-05, 1.5784e-04, 8.4850e-05, 1.1989e-04]])
The loss function:
This is my first time working with LSTMs, and I’ve tried with different widths/depths and different activation functions on the neural network, but I’m still stuck with the same problem, so I’m assuming the problem is with my code.
Thanks a lot in advance!