Affine parameter in batchnorm

From the documentation of batchnorm, “affine – a boolean value that when set to true, gives the layer learnable affine parameters. Default: True”. So, when I set affine=False, does gamma and beta in Ioffe’s paper is 1 and 0 or propagated standard deviation and mean?


When affine=False the output of BatchNorm is equivalent to considering gamma=1 and beta=0 as constants.


By assigning affine=False, are the parameters of gamma and beta still learnable or are they fixed to constant values of gamma=1, beta=0 ?
By the way, how could I assign the initial gamma and beta values if assigining affine=False does not mean to initial gamma and beta but to fix the values?


affine = False is equivalent to simply computing:

y = (x - mu) / sqrt(var + eps)

where, mu is the running (propagated) mean and var is the running (propagated) variance. Equivalently, this can be interpreted as fixing gamma=1 and beta=0 (These will then be non-trainable. Since they don’t appear in the equation above, no gradients will be calculated for those).

If you rather want to initialize gamma, beta to (1, 0) and train them, you’d want to perform something like:

bn = nn.BatchNorm1d(num_c, affine=True)
bn.weight = 1
bn.bias = 0

The formula you’ve given would be used for affine=False. I guess you have a typo in your post. :wink:

1 Like

Ah. My bad. Thanks for pointing that out. :slight_smile:

1 Like

Hi there,
What if I init the model parameters from a pre-trained model (e.g. .pth file) and want to keep the gamma and beta frozen while changing mean and var when resuming the model?
Thank you.

You could set the .requires_grad attributes of the .weight and .bias parameters to False and keep this layer in .train() mode.

1 Like