Trying to understand the relation between pytorch batchnorm and caffe batchnorm

This question stems from comparing the caffe way of batchnormalization layer and the pytorch way of the same. To provide a specific example, let us consider the ResNet50 architecture in caffe (prototxt link). We see “BatchNorm” layer followed by “scale” layers. While in the pytorch model of ResNet50 we see only “BatchNorm2d” layers (without any “scale” layer). If, in particular, I compare the first batchnorm layer in pytorch model and the first batchnorm+scale layer in caffe model we get the following.

Param Name size
========== ====
bn1.weight torch.Size([64])
bn1.bias torch.Size([64])
bn1.running_mean torch.Size([64])
bn1.running_var torch.Size([64])

Param Name size
========== ====
bn_conv1[0] (64,)
bn_conv1[1] (64,)
bn_conv1[2] (1,)
scale_conv1[0] (64,)
scale_conv1[1] (64,)

My question is what is the correspondence between these parameters (basically which one in caffe is what in pytorch)?

I also have a second question. And it is regarding the ‘affine’ argument/parameter of the BatchNorm2d module in pytorch. Does setting it False mean \gamma=1, \beta=0?

1 Like

I think that
bn_conv10 -> bn1.running_mean torch.Size([64])
bn_conv11 -> bn1.running_var torch.Size([64])
scale_conv1[0] (64,) -> bn1.weight torch.Size([64])
scale_conv1[1] (64,) -> bn1.bias torch.Size([64])

I also got confused by the extra parameter bn_conv12. In my caffe model, I do not have this parameter.