Why does the resnet model given by pytorch omit biases from the convolutional layer?

barrel-roll · December 10, 2017, 8:35pm

Hi
I was trying to implement my own resnet model by using the model already provided by Pytorch as a reference. I noticed however that none of the convolutional layers had biases, as declared here:

def conv3x3(in_planes, out_planes, stride=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                                    padding=1, bias=False)

Is there any reason for this? I didn’t see any mention of removing biases in the original paper.

nikostr · December 10, 2017, 9:06pm

See author’s answer here:

The batchnorm layers handle the biases.

barrel-roll · December 10, 2017, 9:21pm

Oh! In that case what are the weights in the batchnorm layer?
I was under the impression that bias and weights in the batchnorm were the expectation and variance respectively.

nikostr · December 10, 2017, 9:29pm

In the original batchnorm paper they mention learnable params beta and gamma:

And in pytorch the batchnorm implementation has weights and bias in addition to running mean and running standard deviation.

http://pytorch.org/docs/master/_modules/torch/nn/modules/batchnorm.html

Edit: The weights and bias are used to scale and shift the output of the layer.

barrel-roll · December 10, 2017, 9:44pm

Okay, I get it now.

Thanks! This clears everything up.

Caroline_Pearl · December 20, 2020, 12:10am

In the paper section 3.4, last sentence.
Whereas Dropout (Srivastava et al., 2014) is typically used to reduce overfitting,
in a batch-normalized network we found that it can be either removed or reduced in strength.