Convolution Layers no Bias with batchnorm

nicozhou · February 23, 2022, 6:10pm

hi, i just wondering is it better to use bias=False when using batchnorm? Will our network break if not using disabling bias ?

Yuv · February 23, 2022, 10:19pm

It’s very hard to answer your question without more details about the problem you refer to.
Generally removing the bias may harm the performance but you can check it by trying.

nicozhou · February 23, 2022, 10:49pm

Yeah, so I was really confused, as some may say that :

Note that, since we normalize Wu+b, the bias b can be ignored since its effect will be canceled by the subsequent mean subtraction (the role of the bias is subsumed by β in Alg. 1).

This is what I got from batchnorm paper.

Yuv · February 23, 2022, 11:06pm

When using batch normalization you have a learnable parameter β which have the same role as bias when not using batch normalization.
Adding bias term to Wx will result in a new term when averaging in the batch normalization algorithm but that term would vanish because the subsequent mean subtraction, and that why they ignore the biases and this is the purpose of the β learnable parameter.

nicozhou · February 24, 2022, 1:05am

So, it is safe to conclude that it is better to disable bias if we use batchnorm?

Eta_C · February 24, 2022, 5:55am

the bias b can be ignored since its effect will be canceled by the subsequent mean subtraction

# Convolution
y = w * x + b 

# Batch Normalization (without momentum updating)
z = (y - E(y_i)) / STD(y_i)

# note that we have
y - E(y_i) = w * x + b - E(w * x_i + b) = w * x - E(w * x_i)
STD(y_i) = Sqrt(Sigma(y - E(y_i)) + epsilon)

So, it is safe to conclude that it is better to disable bias if we use batchnorm?

If you use BatchNorm after convolution, the answer is YES.