Layer.train() vs layer.eval() vs weight.requires_grad vs weight.grad = 0

sigma_x · July 1, 2021, 11:16am

I’m still confused about these four settings. As far as I understand, they have the following functionality:

layer.train() vs layer.eval() make a difference only for batch normalization layers, as they determine whether means, variances will be computed and their number tracked or not. Only layers can be set to these modes, not the weights. These settings don’t affect the gradient computation and weight update in the model.
weight.requires_grad determines whether gradients for this tensor’s weights will be computed or not, but it doesn’t say anythin about actually upgrading the weights in this tensor. This applies only to the weights, not layers. Weights can be accessed either through layer.weight, layer.bias or looping through model.named_parameters() generator. At the same time, running_mean, running_var, num_batches_tracked in BatchNorm layers are not named_parameters, they are named_buffers, but for some reason can also have requires.grad set to True and grad values, but I’m not sure whether the gradients are in fact computed or updated somehow.
In case I don’t want to upgrade weights in a particular layer, I loop through model.named_parameters(), to set the named parameter’s weight.grad=0, these obviously don’t include BatchNorm layers mentioned above. In case ‘weight.requires_grad=True’, gradients for these weights are still computed in this case.

Is this all correct?