I made a module that uses the following MLP module:
class MLP(nn.Module):
def __init__(self, size_layers, activation):
super(MLP, self).__init__()
self.layers=[]
self.layersnorm = []
self.activation=activation
for i in range(len(size_layers)-1):
self.layers.append(nn.Linear(size_layers[i], size_layers[i + 1]))
self.add_module('layers_' + str(i),self.layers[-1])
self.layersnorm.append(nn.BatchNorm1d(size_layers[i + 1]))
self.add_module('BatchNorm1d_' + str(i), self.layersnorm[-1])
def forward(self, x):
for i in range(len(self.layers)-1):
if self.activation=='relu':
x = F.relu(self.layersnorm[i](self.layers[i](x)))
elif self.activation=='lrelu':
x = F.leaky_relu(self.layersnorm[i](self.layers[i](x)))
elif self.activation=='tanh':
x = F.tanh(self.layersnorm[i](self.layers[i](x)))
x = self.layersnorm[-1](self.layers[-1](x))
return x
def l1reg(self):
w=0.
for i in range(len(self.layers)):
w = w + torch.sum((self.layers[i].weight).abs())
return w
everything works fine without batch normalization.
With batch normalization the training seem to work, but the evaluation (using model.eval()) produces NaN.
is there something Iām doing wrong with batch normalization?
Could you provide a minimal script that reproduces the problem?
I can imagine having NaN during training mode if all the elements of the batch are zero, and so the mean and the std over the batch would be zero as well, leading to NaN.
thanks for the reply. the MLP code works independently but not inside another module. Iāll check what is wrong.
in any case without āmodel.eval()ā it works fine. This is strange.
where do I find an example or documentation of nn.ModuleList?
thank you!
I also have that problem. I wanted to suggest increasing eps, which temporarily seemed to fixed the issue, but it didnāt. Is there any suggestion how to debug this? My input is fine (no nanās).
I ran into a similar problem - I am using BatchNorm1d with a batch size of 1, which always results in running_vars which are NaNās. Specifically, this only occurs with a batch of size 1.
This problem doesnāt occur with BatchNorm2d.
I thought it was possibly due to the eps value as someone suggested above, but this wouldnāt explain why itās ok for 2d cases and why it doesnāt produce NaNās for the first stddev calculation.
EDIT:
I presume the NaN isnāt a result of performing 1 / (0 + eps)? Where the 0 arises because it is computing the variance from a single example.
For example:
input = torch.FloatTensor(1,4).normal_(0,1)
bn = nn.BatchNorm1d(4)
output = bn(Variable(input))
print("output ...\n", output)
print("running mean ...\n",bn.running_mean)
print("running var ...\n",bn.running_var)
A guess would be that BatchNorm uses Besselās correction for variance and this makes it NaN (computed variance is 0, n / (n - 1) * var = 1 / 0 * 0 = NaN.
In case of batchnorm2d and batch size = 1. Does it work for you even in eval() mode? Iām currently using batchnorm2d with batch size = 1, but I have to stay in train() mode, otherwise the accuracy drops dramatically.
A model employing Batch Normalization can be trained using batch gradient descent, or Stochastic Gradient Descent with a mini-batch size m > 1
This is because of the Besselās correction as pointed out by Adam
A guess would be that BatchNorm uses Besselās correction for variance and this makes it NaN (computed variance is 0, n / (n - 1) * var = 1 / 0 * 0 = NaN.
So if, you can afford to use batch size > 1, that would solve the NaN problem for you.
If you are using very small batch size or non i.i.d batches, maybe you could look at Batch Renormalization (https://arxiv.org/pdf/1702.03275.pdf).
In that case there is some other problem, most probably with your data. Batchnorm by itself will not give nan for batch sizes greater than 1. Did you scale your data? If in your training you were using float in range 0-1 and in test if its int 0-65535, your network might blowup.
I have sloived the problem in my case.
my len(train_data) = 55937 and my batchsize = 64 >> 1, It looks like no problem.
but I have found that 55937 % 64 = 1, which means the last batchsize =1,
so runing_var becomes nan after 1 epoch.
hope it helps you.
But I wanna ask that here, does the n is the .num_batches_tracked in the BatchNorm parameters?
and why is that my batch num is not 1 and still get nan,