Not sure about LSTM implementation

Hello! I am writing my first LSTM network and I would really appreciate if someone can tell me if it is right (the loss seems to go down very slowly and before playing around with hyper parameters I want to make sure that the code is actually doing what I want). The code is meant to go through some time series and label each point according to some categories. In the version I am putting here there are just 2 categories: 0, if the value of the point is 1 and 1 otherwise (I know it’s a bit weird, but I didn’t choose the labels :slight_smile: ). So this is the code:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
import numpy as np
from fastai.learner import *

torch.manual_seed(1)
torch.cuda.set_device(0)

bs = 2

x_trn = torch.tensor([[1.0000, 1.0000],
        [1.0000, 0.9870],
        [0.9962, 0.9848],
        [1.0000, 1.0000]]).cuda()

y_trn = torch.tensor([[0, 0],
        [0, 1],
        [1, 1],
        [0, 0]]).cuda()

n_hidden = 5
n_classes = 2

class TESS_LSTM(nn.Module):
    def __init__(self, nl):
        super().__init__()
        self.nl = nl
        self.rnn = nn.LSTM(1, n_hidden, nl)
        self.l_out = nn.Linear(n_hidden, n_classes)
        self.init_hidden(bs)
        
    def forward(self, input):
        outp,h = self.rnn(input.view(len(input), bs, -1), self.h)
        return F.log_softmax(self.l_out(outp))
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl, bs, n_hidden)),
                  V(torch.zeros(self.nl, bs, n_hidden)))

model = TESS_LSTM(1).cuda()

loss_function = nn.NLLLoss()

optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(10000):  
    model.zero_grad()
    tag_scores = model(x_trn)
    loss = loss_function(tag_scores.reshape(4*bs,n_classes), y_trn.reshape(4*bs))
    loss.backward()
    optimizer.step()
    
    if epoch%1000==0:
        print("Loss at epoch %d = " %epoch, loss)

print(model(x_trn), y_trn)

The (super reduced in size) time series should be [1,1, 0.9962,1], with labels [0,0,1,0] and [1, 0.9870, 0.9848,1] with labels [0,1,1,0] and the batch size should be 2. I really hope I didn’t mess up the dimensionalities, but I tried to make it in a shape accepted by the LSTM. This is the output:

Loss at epoch 0 = tensor(1.3929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 1000 = tensor(0.8939, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 2000 = tensor(0.8664, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 3000 = tensor(0.8390, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 4000 = tensor(0.8339, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 5000 = tensor(0.8288, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 6000 = tensor(0.8246, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 7000 = tensor(0.8202, device='cuda:0', grad_fn=<NllLossBackward>) 
Loss at epoch 8000 = tensor(0.8143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss at epoch 9000 = tensor(0.8108, device='cuda:0', grad_fn=<NllLossBackward>)

(tensor([[[-9.0142e-01, -1.2631e+01],
          [-9.3762e-01, -9.6707e+00]],
 
         [[-1.3467e+00, -3.9542e+00],
          [-2.2005e+00, -7.6977e-01]],
 
         [[-2.4500e+01, -1.9363e-02],
          [-2.3349e+01, -6.2210e-01]],
 
         [[-1.0969e+00, -2.1953e+01],
          [-6.9776e-01, -1.8608e+01]]], device='cuda:0',
        grad_fn=<LogSoftmaxBackward>), tensor([[0, 0],
         [0, 1],
         [1, 1],
         [0, 0]], device='cuda:0'))

The loss doesn’t go down too fast (I expected it to overfit and go really close to zero). The actual values are okish (the smaller one is always the right one), but they can be definitely improved. Can someone tell me if my code is doing what I want (and maybe suggest why is the loss still big - maybe I need a smaller LR?)

Side note: I used the fastai package for the init_hidden() function.