From the documentation of batchnorm, “affine – a boolean value that when set to true, gives the layer learnable affine parameters. Default: True”. So, when I set affine=False, does gamma and beta in Ioffe’s paper is 1 and 0 or propagated standard deviation and mean?
When affine=False
the output of BatchNorm
is equivalent to considering gamma=1
and beta=0
as constants.
By assigning affine=False
, are the parameters of gamma
and beta
still learnable or are they fixed to constant values of gamma=1
, beta=0
?
By the way, how could I assign the initial gamma and beta values if assigining affine=False
does not mean to initial gamma
and beta
but to fix the values?
affine = False
is equivalent to simply computing:
y = (x - mu) / sqrt(var + eps)
where, mu
is the running (propagated) mean and var
is the running (propagated) variance. Equivalently, this can be interpreted as fixing gamma=1
and beta=0
(These will then be non-trainable. Since they don’t appear in the equation above, no gradients will be calculated for those).
If you rather want to initialize gamma, beta to (1, 0) and train them, you’d want to perform something like:
bn = nn.BatchNorm1d(num_c, affine=True)
bn.weight = 1
bn.bias = 0
The formula you’ve given would be used for affine=False
. I guess you have a typo in your post.
Ah. My bad. Thanks for pointing that out.
Hi there,
What if I init the model parameters from a pre-trained model (e.g. .pth file) and want to keep the gamma and beta frozen while changing mean and var when resuming the model?
Thank you.
You could set the .requires_grad
attributes of the .weight
and .bias
parameters to False
and keep this layer in .train()
mode.