I’m still confused about these four settings. As far as I understand, they have the following functionality:
layer.eval()make a difference only for batch normalization layers, as they determine whether means, variances will be computed and their number tracked or not. Only layers can be set to these modes, not the weights. These settings don’t affect the gradient computation and weight update in the model.
weight.requires_graddetermines whether gradients for this tensor’s weights will be computed or not, but it doesn’t say anythin about actually upgrading the weights in this tensor. This applies only to the weights, not layers. Weights can be accessed either through
layer.weight, layer.biasor looping through
model.named_parameters()generator. At the same time,
running_mean, running_var, num_batches_trackedin BatchNorm layers are not
named_parameters, they are
named_buffers, but for some reason can also have
gradvalues, but I’m not sure whether the gradients are in fact computed or updated somehow.
In case I don’t want to upgrade weights in a particular layer, I loop through
model.named_parameters(), to set the named parameter’s
weight.grad=0, these obviously don’t include BatchNorm layers mentioned above. In case ‘weight.requires_grad=True’, gradients for these weights are still computed in this case.
Is this all correct?