Could you explain your concern a bit?
Usually the bias
is removed in conv layers before a batch norm layer, as the batch norm’s beta
parameter (bias
of nn.BatchNorm
) will have the same effect and the bias
of the conv layer might be canceled out by the mean subtraction.
From the batch norm paper:
Note that, since we normalize Wu+b, the bias b can be ignored since its effect will be canceled by the subsequent mean subtraction (the role of the bias is subsumed by β in Alg. 1).