Problem with nan values of the model Parameters: weight and bias

nesrine · March 15, 2018, 10:20am

Hallo

I’m new in deep learning.
Once my batch is generated and i start to train my model i have always a problem with this nan values in
output = model(input_var)
When i debug i find also a nan values in the model parametres : weight and bias.
These nan values are generated after n iteration.
Do you have any idea where can the error.

Many thanks for your reply

Screenshot%20from%202018-03-15%2011-04-08|690x454690x454](upload://b9EYe4Tk1tGJDarCjYR3AaWocnJ.png)

Freyj · March 15, 2018, 10:23am

Hello,
your screenshot/link is not displaying properly, which is kind of problematic if you want some help
Maybe a small formatted code of your program might also help people to give you some advice
(if I’m not mistaken, ``` surrounding code formats it to be easier to read)

nesrine · March 15, 2018, 10:58am

def train_epoch_cpu(trainLoader,model,optimizer,criterion,epoch):

Losses=[]

model.train()
for i, (ids, tensor) in enumerate(trainLoader):
   
    input_var = torch.autograd.Variable(tensor)
    # compute output
    output = model(input_var)
    loss = criterion(output)

    # loss = angular_distance(output.data.cpu(), targets.cpu())
    # compute gradient and do SGD step
    optimizer.zero_grad()
    loss.backward()
    Losses.append(loss.data.cpu()[0])
    optimizer.step()


    #print('Train Epoch: [{0}][{1}/{2}]\t'
       # 'angles {ang} ({ang})\t'
      #  'Loss {loss} ({loss})\t'.format(
       #  epoch, i, len(trainLoader), ang=loss, loss=Losses))

return loss

here in output = model(input_var) the returned output is nan.
When i debug the code i find the model parametrs are nan also.
the initial declared net is :
NetDSpace (
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear (144 -> 2048)
(fc2): Linear (2048 -> 1024)
(fc3): Linear (1024 -> 400)

Freyj · March 15, 2018, 1:22pm

Is your tensor also returning a nan ( when you make it into a Variable?)

nesrine · March 15, 2018, 1:25pm

Can you explain please in wihch step or function do you mean.
many thanks

Freyj · March 15, 2018, 1:30pm

Sure, I was thinking about this line, maybe there is something that goes wrong in your tensor ?

nesrine · March 15, 2018, 1:55pm

How Can I detect that’s some thing is wrong.
Normaly the input size is [torch.FloatTensor of size 64x1x20x20]
64 is the size ob Batch
20x20 the size of my input image

chenglu · March 15, 2018, 2:46pm

Is there any error message reported?

nesrine · March 15, 2018, 3:20pm

There are no error message but the problem in the printed loss values:
(I have modified the size of the net)
NetDSpace (
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear (144 -> 120)
(fc2): Linear (120 -> 84)
(fc3): Linear (84 -> 10)
)

Average IOU during train Reg0 , epoch 0 is Variable containing:
0.2594
[torch.FloatTensor of size 1]

Average IOU during train Reg0 , epoch 1 is Variable containing:
0.3301
[torch.FloatTensor of size 1]

Average IOU during train Reg0 , epoch 2 is Variable containing:
nan
[torch.FloatTensor of size 1]

Average IOU during train Reg0 , epoch 3 is Variable containing:
nan
[torch.FloatTensor of size 1]

Average IOU during train Reg0 , epoch 4 is Variable containing:
nan
[torch.FloatTensor of size 1]

jpeg729 · March 15, 2018, 8:33pm

My guess is that the inputs are not properly normalised.

nesrine · March 16, 2018, 3:03pm

how can i ensure the normalisation.
My batch is the filename list of the input training gray images

nesrine · March 19, 2018, 3:39pm

If there any suggestion to normalize it properly?

peter · March 21, 2018, 6:45am

Although the proper way is to find the mean and variance for your whole training set and use that to normalise your images (scikit-learn has some classes for this) there is a quicker way to validate if normalisation helps.

Just add a batch normalisation (torch.nn.BatchNorm2d) as the very first layer of your network and it will normalise on a per batch basis (64 images in your case). Not perfect but should give enough intuition if this will solve your problem.

And when indeed that helps you can use normalise over your whole training set.

nesrine · March 22, 2018, 4:32pm

Hi Peter
Many thanks for your recommandation.
In my code like that:

def __init__(self):
    super(NetDSpace, self).__init__()
    self.conv1 = nn.Conv2d(1, 6, 3)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(6, 16, 3)
    self.conv2_bn = nn.BatchNorm2d(16)
    self.fc1 = nn.Linear(16 * 3 * 3, 120)
    self.fc1_bn = nn.BatchNorm1d(120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = x.view(-1, 16 * 3 * 3)
    x = F.relu(self.fc1_bn(self.fc1(x)))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

So It’s correct?
I have tested and I have no more nan values but the loss values are very variable some times decrease and some times increase. (not stable)
Question please What did you mean with ‘And when indeed that helps you can use normalise over your whole training set.’

peter · March 22, 2018, 8:07pm

I actually meant to say to only add the BatchNorm as the first layer of your network (so before conv1) in order to “simulate” normalisation. But if this works and avoids the NaN then indeed your problem (or part of it) seems to be normalisation or more correct the lack of it.

The downside of BatchNorm is that the normalisation only happens per batch, so 64 images in your case. You normally achieves better results if you do the normalisation based on your complete training set and not just 64 at a time.

So you calculate the mean and variance for all you training images and then normalise each image using this mean and variance. This would be done in the preprocessing phase (and not part of your network as is the case with BatchNorm).I’m guess some searching on image normalisation should give you some reusable code snippets.

aam541 · September 25, 2018, 7:58pm

I wonder if the normalization solved your probelm? I am interested because I have similar problem which I cannot solve

nesrine · October 10, 2018, 7:36am

The normalization don’t resolve the problem, but can help as the main problem in my case was related to the loss function.

Marat · August 12, 2020, 10:15am

I do not get the point, how image normalization can in theory help avoiding NaNs? Although I understand why normalization is helpful (in terms of stability) even at the first layer of a network your point seems super strange to me.

ptrblck · August 15, 2020, 6:15am

Unnormalized inputs could theoretically create large intermediate activations. If the training diverges due to these high values, you might encounter an overflow and could run into NaNs.
This happened a few times in other posts and you would usually see a very high loss in some iterations before it blows up.