I noticed that batchNorm2d with affine parameter true is using learnable parameters for each channel (64) instead of each input activation (64x32x32), I guess thats intentional to reduce number of parameters and I am missing something ?
Number of Input Channels = 64
I am using the following resent model from -->
This file has been truncated.
'''ResNet in PyTorch.
For Pre-activation ResNet, see 'preact_resnet.py'.
 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
expansion = 1
def __init__(self, in_planes, planes, stride=1):
That is intentional. It seems that batch norm seems to mean different things to different people when it comes to the specifics…
I have another question, so if we have the learnable parameters are false, then the running_mean and running_variance are per channel or per activation then (I guess per channel) ?
Per channel. You can tell by the fact that you don’t actually provide dimensions beyond the number of channels to BN, so it is unaware of e.g. the “image” dimensions you feed through it.
If you do
bn = torch.nn.BatchNorm(3, affine=True)
you also have proof that it has three-element vectors for all four state items. Part of the beauty of PyTorch is that you can easily poke the modules to see how they behave.
Thanks a lot
I appreciate your explanation.
To compute per channel running_mean, it computes the mean across all activations for that channel (including all the examples) OR just first mean across examples and then mean across activations ?