Loss remains constant in training the network

Here is my network:

import torch.nn as nn
from torch.autograd import Variable

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.i2h = nn.Linear(input_size , hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        self.h2h = nn.Linear(hidden_size, hidden_size)
        self.Relu = nn.ReLU()
        self.softmax = nn.LogSoftmax(dim = 1)

    def forward(self, input, hidden):
        h = self.Relu(self.h2h(hidden)+  self.i2h(input))
        o = self.softmax(self.h2o(h))
        return o, h

    def init_hidden(self):
        return Variable(torch.zeros(1, self.hidden_size), requires_grad=True)

rnn = RNN(n_chars, 90, n_chars)
criterion = nn.L1Loss()
learning_rate = 0.05
optimizer = torch.optim.Adam(rnn.parameters(), lr = learning_rate)
hidden = rnn.init_hidden()
epochs = 5

rnn.cuda()
for epoch in range(epochs):
    for i in range(len(X)):
        for ele in X[i]:
            output, hidden = rnn(Variable(ele.t()).cuda(), hidden.cuda())
        loss = criterion(output, Variable(Y[i]).cuda())
        
        loss.backward(retain_graph=True)
        optimizer.zero_grad()
        optimizer.step()
        if (i%10 == 0):
            print(loss)

The loss I get is approximately constant at 4.513. Why is the loss not changing?

You are deleting the gradients after they were computed and before the weight updates were performed.
Try to move optimizer.zero_grad(), e.g.:

for epoch in range(epochs):
    for i in range(len(X)):
        optimizer.zero_grad()
        for ele in X[i]:
            output, hidden = rnn(Variable(ele.t()).cuda(), hidden.cuda())
        loss = criterion(output, Variable(Y[i]).cuda())
        
        loss.backward(retain_graph=True)
        optimizer.step()
2 Likes

What exactly does optimizer.zero_grad() do?

It sets all gradients to zero, i.e. is basically deletes all gradients from the Parameters, which were passed to the optimizer.
You need it, because the gradients won’t be cleared otherwise and thus they will be accumulated in each iteration.

I shifted the optimizer.zero_grad() above, but the loss is still constant.:frowning:

When I remove the optimizer completely, the loss remains exactly constant at 4.5315. I have this feeling that the weight update isn’t happening.

It’s probably not the error, but you should call .cuda on the Tensor before wrapping it in a Variable (for example in this case Variable(Y[i]).cuda().
Could you check the gradients with rnn.i2h.weight.grad?
Also could you provide the shapes of X and Y?

print(rnn.i2h.weight.grad) gives me a 90x90 matrix consisting of all nan values. Also, each element of X i.e X[i] is a vector of length 90. Y is also a vector of length 90.

Why are they nan values?

Do you see these values from the beginning of your training?

Yeah they were nan from the very beginning. Why?

Have you found the reason?