I was trying to implement my own resnet model by using the model already provided by Pytorch as a reference. I noticed however that none of the convolutional layers had biases, as declared here:
def conv3x3(in_planes, out_planes, stride=1):
"3x3 convolution with padding"
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
Is there any reason for this? I didn’t see any mention of removing biases in the original paper.
See author’s answer here:
The batchnorm layers handle the biases.
Oh! In that case what are the weights in the batchnorm layer?
I was under the impression that bias and weights in the batchnorm were the expectation and variance respectively.
In the original batchnorm paper they mention learnable params beta and gamma:
And in pytorch the batchnorm implementation has weights and bias in addition to running mean and running standard deviation.
Edit: The weights and bias are used to scale and shift the output of the layer.
Okay, I get it now.
Thanks! This clears everything up.
In the paper section 3.4, last sentence.
Whereas Dropout (Srivastava et al., 2014) is typically used to reduce overfitting,
in a batch-normalized network we found that it can be either removed or reduced in strength.