Categorical loss not working

Hello! I am trying to build a simple LSTM, which should predict, if some time-series has a value of 1 (in which case it should predict a zero) or not (in which case it should predict a 1). Here is my code:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import numpy as np

torch.manual_seed(1)
#torch.cuda.set_device(0)

bs = 2

x_trn = torch.tensor([[1.0000, 1.0000],
        [1.0000, 0.9870],
        [0.9962, 0.9848],
        [1.0000, 1.0000]])#.cuda()

y_trn = torch.tensor([[0, 0],
        [0, 1],
        [1, 1],
        [0, 0]])#.cuda()

n_hidden = 5
n_classes = 2

class TESS_LSTM(nn.Module):
    def __init__(self, nl):
        super().__init__()
        self.nl = nl
        self.rnn = nn.LSTM(1, n_hidden, nl)
        self.l_out = nn.Linear(n_hidden, n_classes)
        self.init_hidden(bs)

    def forward(self, input):
        outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
        return F.log_softmax(self.l_out(outp),dim=1)

    def init_hidden(self, bs):
        self.h = (Variable(torch.zeros(self.nl, bs, n_hidden)),Variable(torch.zeros(self.nl, bs, n_hidden)))

model = TESS_LSTM(1)#.cuda()

loss_function = nn.NLLLoss()

optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(10000):  
    model.zero_grad()
    tag_scores = model(x_trn)
    loss = loss_function(tag_scores.reshape(4*bs,n_classes), y_trn.reshape(4*bs))
    loss.backward()
    optimizer.step()

    if epoch%1000==0:
        print("Loss at epoch %d = " %epoch, loss)

print(model(x_trn), y_trn)

So the 2 time series should be (here the batch size is 2):[1,1,0.9962,1] and [1, 0.9870,0.9848,1] and the desired output should be [0,0,1,0] and [0,1,1,0]. This is the output of my network:

Loss at epoch 0 =  tensor(0.6932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 1000 =  tensor(0.5235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 2000 =  tensor(0.5207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 3000 =  tensor(0.5202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 4000 =  tensor(0.5200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 5000 =  tensor(0.5200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 6000 =  tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 7000 =  tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 8000 =  tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 9000 =  tensor(0.5199, device='cuda:0', grad_fn=<NllLossBackward>)
tensor([[[-6.9315e-01, -6.9315e-01],
         [-6.9315e-01, -6.9315e-01]],

        [[-3.5286e-05, -1.0418e+01],
         [-1.0249e+01, -3.0518e-05]],

        [[-6.9175e-01, -6.9316e-01],
         [-6.9455e-01, -6.9313e-01]],

        [[-6.9307e-01, -6.8876e-01],
         [-6.9322e-01, -6.9756e-01]]], device='cuda:0',
       grad_fn=<LogSoftmaxBackward>) tensor([[0, 0],
        [0, 1],
        [1, 1],
        [0, 0]], device='cuda:0')

The loss goes down for a bit, but then it doesn’t really decrease, no matter what I try. What is weird is that, for example, the first 2 prediction are 50%-50% between 1 and 0 (i.e. both are -6.9315e-01), instead of predicting 0 and 0. I am not sure why the other values are kinda going in the right directions (some very slowly) but these are not, even if that would obviously decrease the loss function. Am I doing something wrong with the way I am passing the data to the network? Or what is going on? Any advice is really appreciated. Thank you!

Hi, can you show a minimal reproducible code?

Hello! That is the whole code

Sorry, I didn’t notice that it needs scrolling…

Did you try a larger hidden size? Maybe the capacity is not enough.
Also, usually Adam’s lr is 10e-4 ~ 10e-3, so this might be the reason.