RNN weights get converted to nan values

ayush1999 · March 31, 2018, 9:44am

I’ve created the RNN class as :

import torch.nn as nn
from torch.autograd import Variable

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.i2h = nn.Linear(input_size , hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        self.h2h = nn.Linear(hidden_size, hidden_size)
        self.Relu = nn.ReLU()
        self.softmax = nn.Tanh()

    def forward(self, input, hidden):
        h = self.Relu(self.h2h(hidden)+  self.i2h(input))
        o = self.softmax(self.h2o(h))
        return o, h

    def init_hidden(self):
        return Variable(torch.zeros(1, self.hidden_size), requires_grad=True)

Then, I create the network as :

rnn = RNN(n_chars, 90, n_chars)
criterion = nn.MSELoss()
learning_rate = 0.05
optimizer = torch.optim.Adam(rnn.parameters(), lr = learning_rate)
hidden = rnn.init_hidden()
epochs = 5

Currently, the value of rnn.i2h.weigh.grad is equal to None.

But when I train the network, after 2-3 iterations, all the values of rnn.i2h.weight.grad become Nan. This makes training the network impossible.

Why is this happening?

greed2411 · June 21, 2018, 3:46am

i came across this problem yesterday. a quick fix seems to be lowering your learning rate. I tried it with 1e-2, it started throwing nans, nans,nans.
I made it to 1e-3. converged fine.

this might be helpful, otherwise.