But isn’t the state_dict
the weights of the layers themselves? If the layers were properly frozen, there should not be changes to the state_dict
values of the pretrained layers post training unless the state_dict
represents something else other than the layer weights but for my scenario, the values within the state_dict
of the main model for the respective pretrained layers change.
The state_dict
represents just the “data” of the tensors, i.e. just their values.
You should thus not check the requires_grad
attributes of these tensors or any other Autograd-related attributes.
When you say “Data”, are you referring the the transformed values of the network’s input or the weights of the pretrained layer?
You should thus not check the
requires_grad
attributes of these tensors or any other Autograd-related attributes.
I think that there is some misunderstanding here. I wasn’t referring specifically to the requires_grad
attribute of those tensors but rather the changes in their values. If the state_dict
does indeed contain the weights of the pretrained layers, it should not change after the training of the main network with the requires_grad
attribute of the pretrained layers set to False
.
The state_dict
contains all parameters and buffers of the model.
That’s not quite right but might be unexpected, as optimizers using running stats could still update parameters even if their gradients are zero (the running stats might not be zero).
Since you are using Adam
this might be the case here as well.