Changing state dict value is not changing model

I am trying to change the value in my model’s state dict, but even after updating the state dict, the value does not change, any help would be appreciated.

sd = model.state_dict() 
sd['encoder.layer.11.output.LayerNorm._running_mean'] = layer_norm_stats['encoder.layer.11.output.LayerNorm._running_mean']  # Layer norm stats is a dict containing mean of the layer norm 
print(model.state_dict()['encoder.layer.11.output.LayerNorm._running_mean'])  # prints all zero tensor, which is the original value
print(layer_norm_stats['encoder.layer.11.output.LayerNorm._running_mean'] )  # prints the running mean, which is not the zero vector
1 Like

Shouldn’t you look at
sd['encoder.layer.11.output.LayerNorm._running_mean'] in the second print statement if it has changed ?

Thanks for your reply, that is my question, should not changing a copy of the state dict also directly change the model values

The values in the model parameters won’t be changed, if you assign a new tensor to the key in the state_dict.
You could either load the manipulated state_dict afterwards or change the parameter’s value inplace as shown here:

model = nn.Linear(1, 1)
print(model.weight)
> Parameter containing:
tensor([[0.8777]], requires_grad=True)

sd = model.state_dict()
sd['weight'] = torch.tensor([[1.]])
print(model.weight)
> Parameter containing:
tensor([[0.8777]], requires_grad=True)

model.load_state_dict(sd)
print(model.weight)
> Parameter containing:
tensor([[1.]], requires_grad=True)


# or
model = nn.Linear(1, 1)
print(model.weight)
> Parameter containing:
tensor([[-0.8112]], requires_grad=True)

with torch.no_grad():
    sd = model.state_dict()
    sd['weight'].fill_(1.)
print(model.weight)
> Parameter containing:
tensor([[1.]], requires_grad=True)
4 Likes

Thank you that was helpful!

could you please explain why is the design like this? because if one does deepcopy and then re-assign, it gets updated.

Could you explain your concern a bit more, please?
I don’t understand which object you are creating a deepcopy of and what your use case is.

Okay, I am sorry for not being clear the last time. Here is something that is weird for me:

as you have shown above one will have to run model.load_state_dict(sd) again in order to see the change in the weights in the model variable. But the following sequence of commands also changes the value in the model variable, and it is confusing:

>>> model = nn.Linear(1, 1)
>>> sd = model.state_dict()
>>> sd['weight'].copy_(torch.tensor([[1.]]))

On printing model.weight after the above sequence of instructions one will find out that the model’s weights have been updated! With whatever I understand Tensor.copy_() behaves something similar to a deepcopy - but this behavior is counterintuitive.

Thanks for the explanation. As explained in this post references are stored in the state_dict and thus inplace operations are visible in the original model, too.

No, that’s wrong as .copy_() is an inplace operation and does not create a separate copy, but manipulates the underlying data directly inplace.

Thanks for the quick response and for clarifying that .copy_() is an in-place operation. Any reason it is designed this way? It is weird that when I am using the normal assignment, the sd variable we defined does not behave as a reference, but while using .copy_() it does!.

The state_dict uses references e.g. to avoid wasting memory. Otherwise each model.state_dict() call would create completely new tensors, which will increase the memory by the model’s parameter and buffer size. Assigning a new tensor to a dict will break the reference, so this behavior is also expected.

Then this should be true for all types of assignments right, not just in-place assignments?
Thanks again for the quick response.

I don’t know which use case you are thinking about but generally yes: creating a new tensor and assigning it to any attribute/key/etc. will overwrite the existing object and will not manipulate it inplace.

NP. Thanks for the clarification!

Sorry to bother you, I am still confused about this. Does “The state_dict uses references” mean that when I call the .state_dict() every time, it will return the parameters’ references? So that even I let .state_dict()[“weight”]=1 which breaks the references and makes the value of the dict change, the return of the .state_dict() will not change.

for param in model.state_dict():
    model.state_dict()[param] =1

for param in model.state_dict():
    print(param, model.state_dict()[param])

Yes, as seen in your code. You will replace the stored parameter reference via the param key with any object that also does not have to be a valid tensor as seen in your example. If you want to manipulate parameters through the state_dict, use inplace operations:

model = nn.Linear(1, 1)

for param in model.state_dict():
    model.state_dict()[param].fill_(1.)

for param in model.state_dict():
    print(param, model.state_dict()[param])

# weight tensor([[1.]])
# bias tensor([1.])

There are no other problems, and I sincerely thank you for your generous and clear reply! Wish you a good day :slight_smile:

1 Like