I’m still confused about these four settings. As far as I understand, they have the following functionality:
-
layer.train()
vslayer.eval()
make a difference only for batch normalization layers, as they determine whether means, variances will be computed and their number tracked or not. Only layers can be set to these modes, not the weights. These settings don’t affect the gradient computation and weight update in the model. -
weight.requires_grad
determines whether gradients for this tensor’s weights will be computed or not, but it doesn’t say anythin about actually upgrading the weights in this tensor. This applies only to the weights, not layers. Weights can be accessed either throughlayer.weight, layer.bias
or looping throughmodel.named_parameters()
generator. At the same time,running_mean, running_var, num_batches_tracked
in BatchNorm layers are notnamed_parameters
, they arenamed_buffers
, but for some reason can also haverequires.grad
set toTrue
andgrad
values, but I’m not sure whether the gradients are in fact computed or updated somehow. -
In case I don’t want to upgrade weights in a particular layer, I loop through
model.named_parameters()
, to set the named parameter’sweight.grad=0
, these obviously don’t include BatchNorm layers mentioned above. In case ‘weight.requires_grad=True’, gradients for these weights are still computed in this case.
Is this all correct?