LSTM Not Training

I’ve got this custom LSTM I’m making. Trying to implement Amazon’s DeepAR from scratch. It’s definitely not training and I suspect the code I wrote for training is the problem. The loss has tripled from the first loss… Lemme know what you think.

def training(input, true_values):
    hidden = hidden_init
    c = c_init
    losses = []
    for epoch in range(1):
        for i in range(input.size()[0]):
            our_input = input[i]
            our_true_values = true_values[i]

            model.zero_grad()
            update_h = model.double().next(our_input, (hidden, c))[0].detach()
            hidden = update_h
            update_c = model.next(our_input, (hidden, c))[1].detach()
            c = update_c
            output = model.probout(hidden)
            
            losses.append(output)
            the_loss = loss(output, our_true_values)
            the_loss.backward()
            optimizer.step()

    return losses

Can you share the code of your model?

Your training loop looks also rather uncommon? Why is model.next() doing, and why is it called twice taking out_input as a parameter? I assume that ‘next()’ and ‘probout()’ kind of do the things usually packed into the forward() method. Did adopted some existing code, the structure looks very uncommon.

Certainly! I realize using .next() twice was a silly error. I’ve replaced it with update_h, update_c = model.next(our_input, (hidden, c)).detach()
It’s certainly non-standard — I’m quite new to this. Here’s the model code (messy):

class LSTM_encoder(nn.Module):
    def __init__(self, input_length, hidden_length):
        super(LSTM_encoder,self).__init__()
        self.input_length = input_length
        self.hidden_length = hidden_length

        self.forgetb = nn.Linear(self.input_length, self.hidden_length,bias=True)
        self.inb = nn.Linear(self.input_length, self.hidden_length, bias=True)
        self.outb = nn.Linear(self.input_length, self.hidden_length, bias=True)
        self.memb = nn.Linear(self.input_length, self.hidden_length, bias=True)
        self.nextb = nn.Linear(self.input_length, self.hidden_length, bias=True)

        self.meanh = nn.Linear(self.hidden_length, 1, bias=True)
        self.varh = nn.Linear(self.hidden_length, 1, bias=True)
        
        self.tanh = nn.Tanh()
        self.sig = nn.Sigmoid()
        self.softmax = nn.LogSoftmax()
        self.softplus = nn.Softplus()

        self.forgetnb = nn.Linear(self.hidden_length, self.hidden_length, bias=False)
        self.innb = nn.Linear(self.hidden_length, self.hidden_length, bias=False) 
        self.outnb = nn.Linear(self.hidden_length, self.hidden_length, bias=False)
        self.memnb = nn.Linear(self.hidden_length, self.hidden_length, bias=False)
        self.nextnb = nn.Linear(self.hidden_length, self.hidden_length, bias=False)
    
    def forget(self, x, h):
        return self.sig(self.forgetb(x) + self.forgetnb(h))
    
    def input(self,x,h):
        return self.sig(self.inb(x)+self.innb(h))
    
    def memory(self, input, forget_factor, x, h, c_prev):
        a = self.memb(x)
        b = self.memnb(h)
        y = self.tanh(a+b)
        z = y * input
        c = forget_factor * c_prev
        c_next = z + c
        return c_next
    
    def out(self, x, h):
        return self.sig(self.outb(x)+self.outnb(h))

    def mean(self, h):
        return self.meanh(h)

    def var(self, h):
        a = self.varh(h)
        return self.softplus(a)

# We assume normally distributed noise

    def probout(self, h):
        mean = self.mean(h)
        var = self.var(h)
        s = torch.normal(mean, var)
        return s

    def next(self, x, input_pair):
        (h, c_prev) = input_pair
        input = self.input(x, h)
        forget_factor = self.forget(x, h)
        c_next = self.memory(input, forget_factor, x, h, c_prev)
        out = self.out(x,h)
        e = self.nextb(x)
        d = self.nextnb(h)
        h_next = out * self.tanh(e + d)
        return h_next, c_next
    
    def init_hidden(self, x):
        return torch.zeros(1, self.hidden_length).double() + torch.randn(1, self.hidden_length).double()
    
    def init_c(self, x):
        return torch.zeros(1, self.hidden_length).double() + torch.randn(1, self.hidden_length).double()