Confusion regarding trainable layer parameters

Nutsy · September 21, 2020, 5:58am

But isn’t the state_dict the weights of the layers themselves? If the layers were properly frozen, there should not be changes to the state_dict values of the pretrained layers post training unless the state_dict represents something else other than the layer weights but for my scenario, the values within the state_dict of the main model for the respective pretrained layers change.

ptrblck · September 21, 2020, 7:35am

The state_dict represents just the “data” of the tensors, i.e. just their values.
You should thus not check the requires_grad attributes of these tensors or any other Autograd-related attributes.

Nutsy · September 21, 2020, 7:46am

When you say “Data”, are you referring the the transformed values of the network’s input or the weights of the pretrained layer?

You should thus not check the requires_grad attributes of these tensors or any other Autograd-related attributes.

I think that there is some misunderstanding here. I wasn’t referring specifically to the requires_grad attribute of those tensors but rather the changes in their values. If the state_dict does indeed contain the weights of the pretrained layers, it should not change after the training of the main network with the requires_grad attribute of the pretrained layers set to False.

ptrblck · October 24, 2021, 9:47pm

The state_dict contains all parameters and buffers of the model.

That’s not quite right but might be unexpected, as optimizers using running stats could still update parameters even if their gradients are zero (the running stats might not be zero).
Since you are using Adam this might be the case here as well.