BatchNormalization has two trainable (weight and bias / γ and β) and two mini - batch dependent (mean and variance) parameters. Therefore your code snippet freezes all layers except for the BatchNormalization - Layers what can be useful for finetuning (setting the BatchNormalization - Layers requires_grad flag to false would therefore freeze the two trainable parameters).