What does requires_grad=False on BatchNorm2d perform?

Hi everyone, I have a question regarding BatchNorm2d.

What changes happen in the model if during training I set requires_grad=False on BatchNorm2d layers?
I read that running_mean and running_var are buffers and do not require gradients. Is it true? If so, what will be the difference in BatchNorm2d if I set requires_grad=False opposed to requires_grad=True?

Thanks in advance!

Yes, that’s true as the running stats will be updated in each forward pass if the module is set to training mode using the batch statistics.

By default batchnorm layers will contain trainable parameters (weight and bias), which will get gradients and will thus be updated. Setting their requires_grad attribute to False would freeze these parameters.

Ok thanks!

However, do these parameters (weight and bias) influence the output of the BatchNorm2d layer or they are just there to create consistency among layers’ implementations?
Because looking at the formula, BatchNorm2d requires only the running stats and the expected mean/variance and there is no weight and bias

Thanks in advance!

see : batch norm.
Beta and gamma are weights and bias.
In training time with forward pass E(x) and Var(x) are estimated using batch samples.
In test time, using model.eval() will change the behavior of forward to use running means instead of E(x) and Var(x).

As @mMagmer explained, gamma=weight and beta=bias will be used in the default setup unless you are creating the batchnorm layers with affine=False.

Ok now everything is clear, thank you both! @mMagmer @ptrblck