I’m new in deep learning.
Once my batch is generated and i start to train my model i have always a problem with this nan values in
output = model(input_var)
When i debug i find also a nan values in the model parametres : weight and bias.
These nan values are generated after n iteration.
Do you have any idea where can the error.
Hello,
your screenshot/link is not displaying properly, which is kind of problematic if you want some help
Maybe a small formatted code of your program might also help people to give you some advice
(if I’m not mistaken, ``` surrounding code formats it to be easier to read)
Losses=[]
model.train()
for i, (ids, tensor) in enumerate(trainLoader):
input_var = torch.autograd.Variable(tensor)
# compute output
output = model(input_var)
loss = criterion(output)
# loss = angular_distance(output.data.cpu(), targets.cpu())
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
Losses.append(loss.data.cpu()[0])
optimizer.step()
#print('Train Epoch: [{0}][{1}/{2}]\t'
# 'angles {ang} ({ang})\t'
# 'Loss {loss} ({loss})\t'.format(
# epoch, i, len(trainLoader), ang=loss, loss=Losses))
return loss
here in output = model(input_var) the returned output is nan.
When i debug the code i find the model parametrs are nan also.
the initial declared net is :
NetDSpace (
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear (144 -> 2048)
(fc2): Linear (2048 -> 1024)
(fc3): Linear (1024 -> 400)
How Can I detect that’s some thing is wrong.
Normaly the input size is [torch.FloatTensor of size 64x1x20x20]
64 is the size ob Batch
20x20 the size of my input image
There are no error message but the problem in the printed loss values:
(I have modified the size of the net)
NetDSpace (
(conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
(pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear (144 -> 120)
(fc2): Linear (120 -> 84)
(fc3): Linear (84 -> 10)
)
Average IOU during train Reg0 , epoch 0 is Variable containing:
0.2594
[torch.FloatTensor of size 1]
Average IOU during train Reg0 , epoch 1 is Variable containing:
0.3301
[torch.FloatTensor of size 1]
Average IOU during train Reg0 , epoch 2 is Variable containing:
nan
[torch.FloatTensor of size 1]
Average IOU during train Reg0 , epoch 3 is Variable containing:
nan
[torch.FloatTensor of size 1]
Average IOU during train Reg0 , epoch 4 is Variable containing:
nan
[torch.FloatTensor of size 1]
Although the proper way is to find the mean and variance for your whole training set and use that to normalise your images (scikit-learn has some classes for this) there is a quicker way to validate if normalisation helps.
Just add a batch normalisation (torch.nn.BatchNorm2d) as the very first layer of your network and it will normalise on a per batch basis (64 images in your case). Not perfect but should give enough intuition if this will solve your problem.
And when indeed that helps you can use normalise over your whole training set.
Hi Peter
Many thanks for your recommandation.
In my code like that:
def __init__(self):
super(NetDSpace, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 3)
self.conv2_bn = nn.BatchNorm2d(16)
self.fc1 = nn.Linear(16 * 3 * 3, 120)
self.fc1_bn = nn.BatchNorm1d(120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 3 * 3)
x = F.relu(self.fc1_bn(self.fc1(x)))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
So It’s correct?
I have tested and I have no more nan values but the loss values are very variable some times decrease and some times increase. (not stable)
Question please What did you mean with ‘And when indeed that helps you can use normalise over your whole training set.’
I actually meant to say to only add the BatchNorm as the first layer of your network (so before conv1) in order to “simulate” normalisation. But if this works and avoids the NaN then indeed your problem (or part of it) seems to be normalisation or more correct the lack of it.
The downside of BatchNorm is that the normalisation only happens per batch, so 64 images in your case. You normally achieves better results if you do the normalisation based on your complete training set and not just 64 at a time.
So you calculate the mean and variance for all you training images and then normalise each image using this mean and variance. This would be done in the preprocessing phase (and not part of your network as is the case with BatchNorm).I’m guess some searching on image normalisation should give you some reusable code snippets.
I do not get the point, how image normalization can in theory help avoiding NaNs? Although I understand why normalization is helpful (in terms of stability) even at the first layer of a network your point seems super strange to me.
Unnormalized inputs could theoretically create large intermediate activations. If the training diverges due to these high values, you might encounter an overflow and could run into NaNs.
This happened a few times in other posts and you would usually see a very high loss in some iterations before it blows up.