# My Self Implemented BatchNorm + Relu gives NaN

Here is my own implementation of BatchNorm forward function:
def forward(self, x):
self._check_input_dim(x)

``````if self.training:
N, C, H, W = x.size()
x = x.transpose(0, 1).contiguous().view(C, -1)
mu = x.mean(1, keepdim=True)
sigma = x.var(1, keepdim=True)

self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mu
self.running_var = (1 - self.momentum) * self.running_var + self.momentum * sigma

x = x - mu
x = x / (sigma.sqrt() + self.eps)
x = x * self.weight + self.bias
x = x.view(C, N, H, W).transpose(0, 1)
return x

else:
N, C, H, W = x.size()
x = x.transpose(0, 1).contiguous().view(C, -1)
x = (x - self.running_mean) / (self.running_var.sqrt() + self.eps)
x = x * self.weight + self.bias
x = x.view(C, N, H, W).transpose(0, 1)
return x
``````

However, during training (use Cifar10) it always break down after running for some epochs (I’m using ResNet50, train from scratch) and gives NaN value for the loss. After debugging, I found the error comes from the affine operation: x = x * self.weight + self.bias
which produces all negative values. After the Relu Layer, you get all 0s. Next, after a Conv Layer you will get NaN.

Why Conv Layer gives NaN for all 0 input ?
How should this problem be solved ? Should I add eps after affine transformation?

Problem fixed by changing x = x / (sigma.sqrt() + self.eps) to x = x / (sigma + self.eps).sqrt().
This is because 1/|x| has no derivative at x=0, while 1/(sqrt(x^2 +eps)) has derivative at x=0.

1 Like