# Loss remains constant in training the network

Here is my network:

``````import torch.nn as nn

class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size , hidden_size)
self.h2o = nn.Linear(hidden_size, output_size)
self.h2h = nn.Linear(hidden_size, hidden_size)
self.Relu = nn.ReLU()
self.softmax = nn.LogSoftmax(dim = 1)

def forward(self, input, hidden):
h = self.Relu(self.h2h(hidden)+  self.i2h(input))
o = self.softmax(self.h2o(h))
return o, h

def init_hidden(self):

rnn = RNN(n_chars, 90, n_chars)
criterion = nn.L1Loss()
learning_rate = 0.05
optimizer = torch.optim.Adam(rnn.parameters(), lr = learning_rate)
hidden = rnn.init_hidden()
epochs = 5

rnn.cuda()
for epoch in range(epochs):
for i in range(len(X)):
for ele in X[i]:
output, hidden = rnn(Variable(ele.t()).cuda(), hidden.cuda())
loss = criterion(output, Variable(Y[i]).cuda())

loss.backward(retain_graph=True)
optimizer.step()
if (i%10 == 0):
print(loss)
``````

The loss I get is approximately constant at 4.513. Why is the loss not changing?

You are deleting the gradients after they were computed and before the weight updates were performed.
Try to move `optimizer.zero_grad()`, e.g.:

``````for epoch in range(epochs):
for i in range(len(X)):
for ele in X[i]:
output, hidden = rnn(Variable(ele.t()).cuda(), hidden.cuda())
loss = criterion(output, Variable(Y[i]).cuda())

loss.backward(retain_graph=True)
optimizer.step()
``````
2 Likes

It sets all gradients to zero, i.e. is basically deletes all gradients from the `Parameters`, which were passed to the optimizer.
You need it, because the gradients won’t be cleared otherwise and thus they will be accumulated in each iteration.

I shifted the `optimizer.zero_grad()` above, but the loss is still constant. When I remove the optimizer completely, the loss remains exactly constant at 4.5315. I have this feeling that the weight update isn’t happening.

It’s probably not the error, but you should call `.cuda` on the `Tensor` before wrapping it in a `Variable` (for example in this case `Variable(Y[i]).cuda()`.
Could you check the gradients with `rnn.i2h.weight.grad`?
Also could you provide the shapes of `X` and `Y`?

`print(rnn.i2h.weight.grad)` gives me a 90x90 matrix consisting of all nan values. Also, each element of X i.e X[i] is a vector of length 90. Y is also a vector of length 90.

Why are they nan values?

Do you see these values from the beginning of your training?

Yeah they were nan from the very beginning. Why?

Have you found the reason?