hi, i just wondering is it better to use bias=False when using batchnorm? Will our network break if not using disabling bias ?

It’s very hard to answer your question without more details about the problem you refer to.

Generally removing the bias may harm the performance but you can check it by trying.

Yeah, so I was really confused, as some may say that :

Note that, since we normalize Wu+b, the bias b can be ignored since its effect will be canceled by the subsequent mean subtraction (the role of the bias is subsumed by β in Alg. 1).

This is what I got from batchnorm paper.

When using batch normalization you have a learnable parameter β which have the same role as bias when not using batch normalization.

Adding bias term to Wx will result in a new term when averaging in the batch normalization algorithm but that term would vanish because the subsequent mean subtraction, and that why they ignore the biases and this is the purpose of the β learnable parameter.

So, it is safe to conclude that it is better to disable bias if we use batchnorm?

the bias b can be ignored since its effect will be canceled by the subsequent mean subtraction

```
# Convolution
y = w * x + b
# Batch Normalization (without momentum updating)
z = (y - E(y_i)) / STD(y_i)
# note that we have
y - E(y_i) = w * x + b - E(w * x_i + b) = w * x - E(w * x_i)
STD(y_i) = Sqrt(Sigma(y - E(y_i)) + epsilon)
```

So, it is safe to conclude that it is better to disable bias if we use batchnorm?

If you use BatchNorm **after convolution**, the answer is YES.