I am trying to change the value in my model’s state dict, but even after updating the state dict, the value does not change, any help would be appreciated.
sd = model.state_dict()
sd['encoder.layer.11.output.LayerNorm._running_mean'] = layer_norm_stats['encoder.layer.11.output.LayerNorm._running_mean'] # Layer norm stats is a dict containing mean of the layer norm
print(model.state_dict()['encoder.layer.11.output.LayerNorm._running_mean']) # prints all zero tensor, which is the original value
print(layer_norm_stats['encoder.layer.11.output.LayerNorm._running_mean'] ) # prints the running mean, which is not the zero vector
The values in the model parameters won’t be changed, if you assign a new tensor to the key in the state_dict.
You could either load the manipulated state_dict afterwards or change the parameter’s value inplace as shown here:
Okay, I am sorry for not being clear the last time. Here is something that is weird for me:
as you have shown above one will have to run model.load_state_dict(sd) again in order to see the change in the weights in the model variable. But the following sequence of commands also changes the value in the model variable, and it is confusing:
On printing model.weight after the above sequence of instructions one will find out that the model’s weights have been updated! With whatever I understand Tensor.copy_() behaves something similar to a deepcopy - but this behavior is counterintuitive.
Thanks for the quick response and for clarifying that .copy_() is an in-place operation. Any reason it is designed this way? It is weird that when I am using the normal assignment, the sd variable we defined does not behave as a reference, but while using .copy_() it does!.
The state_dict uses references e.g. to avoid wasting memory. Otherwise each model.state_dict() call would create completely new tensors, which will increase the memory by the model’s parameter and buffer size. Assigning a new tensor to a dict will break the reference, so this behavior is also expected.
I don’t know which use case you are thinking about but generally yes: creating a new tensor and assigning it to any attribute/key/etc. will overwrite the existing object and will not manipulate it inplace.