Hello! I am trying to build a simple LSTM, which should predict, if some time-series has a value of 1 (in which case it should predict a zero) or not (in which case it should predict a 1). Here is my code:

```
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import numpy as np
torch.manual_seed(1)
#torch.cuda.set_device(0)
bs = 2
x_trn = torch.tensor([[1.0000, 1.0000],
[1.0000, 0.9870],
[0.9962, 0.9848],
[1.0000, 1.0000]])#.cuda()
y_trn = torch.tensor([[0, 0],
[0, 1],
[1, 1],
[0, 0]])#.cuda()
n_hidden = 5
n_classes = 2
class TESS_LSTM(nn.Module):
def __init__(self, nl):
super().__init__()
self.nl = nl
self.rnn = nn.LSTM(1, n_hidden, nl)
self.l_out = nn.Linear(n_hidden, n_classes)
self.init_hidden(bs)
def forward(self, input):
outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
return F.log_softmax(self.l_out(outp),dim=1)
def init_hidden(self, bs):
self.h = (Variable(torch.zeros(self.nl, bs, n_hidden)),Variable(torch.zeros(self.nl, bs, n_hidden)))
model = TESS_LSTM(1)#.cuda()
loss_function = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
for epoch in range(10000):
model.zero_grad()
tag_scores = model(x_trn)
loss = loss_function(tag_scores.reshape(4*bs,n_classes), y_trn.reshape(4*bs))
loss.backward()
optimizer.step()
if epoch%1000==0:
print("Loss at epoch %d = " %epoch, loss)
print(model(x_trn), y_trn)
```

So the 2 time series should be (here the batch size is 2):`[1,1,0.9962,1]`

and `[1, 0.9870,0.9848,1]`

and the desired output should be `[0,0,1,0]`

and `[0,1,1,0]`

. This is the output of my network:

```
Loss at epoch 0 = tensor(0.6932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 1000 = tensor(0.5235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 2000 = tensor(0.5207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 3000 = tensor(0.5202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 4000 = tensor(0.5200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 5000 = tensor(0.5200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 6000 = tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 7000 = tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 8000 = tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 9000 = tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
tensor([[[-6.9315e-01, -6.9315e-01],
[-6.9315e-01, -6.9315e-01]],
[[-3.5286e-05, -1.0418e+01],
[-1.0249e+01, -3.0518e-05]],
[[-6.9175e-01, -6.9316e-01],
[-6.9455e-01, -6.9313e-01]],
[[-6.9307e-01, -6.8876e-01],
[-6.9322e-01, -6.9756e-01]]], device='cuda:0',
grad_fn=<LogSoftmaxBackward>) tensor([[0, 0],
[0, 1],
[1, 1],
[0, 0]], device='cuda:0')
```

The loss goes down for a bit, but then it doesn’t really decrease, no matter what I try. What is weird is that, for example, the first 2 prediction are 50%-50% between 1 and 0 (i.e. both are -6.9315e-01), instead of predicting 0 and 0. I am not sure why the other values are kinda going in the right directions (some very slowly) but these are not, even if that would obviously decrease the loss function. Am I doing something wrong with the way I am passing the data to the network? Or what is going on? Any advice is really appreciated. Thank you!