In the following code I clone a state dict of my model parameters and load the clone into the model.
I would like to perform the autograd of the model with regards of the original state dict.
I have created the following code to test it:
import torch
from collections import OrderedDict
from torch import optim
import itertools
if __name__ == "__main__":
#Initialising the model
mlp = torch.nn.Sequential(torch.nn.Linear(1,2,bias=False))
mlp[0].weight=torch.nn.Parameter(torch.tensor([1.0,2.0], requires_grad = True))
# Taking storing dict of the model
original_state_dict = OrderedDict([(k,v) for (k,v) in mlp.state_dict().items()])
for item in original_state_dict.items():
item[1].requires_grad= True
# loading clone of dict of model
mlp_bis_sttdict = OrderedDict([(k,torch.clone(v)) for (k,v) in original_state_dict.items()])
mlp.load_state_dict(mlp_bis_sttdict)
print(next(mlp.parameters()))
# inference of model without stepping the parameters
x=torch.tensor([10.0,100.0],requires_grad=True)
optimizer = optim.Adam(
mlp.parameters(),
0.5)
y=mlp(x)
y.backward() # to change with grad = torch.autograd.grad(y, original_state_dict.values())
# displaying grad
print("## No step")
print(list(iter(original_state_dict.values()))[0].grad)
print(list(iter(mlp_bis_sttdict.values()))[0].grad)
print(next(mlp.parameters()).grad)
# stepping and displaying parameter changes
print("## Step")
optimizer.step()
print(mlp_bis_sttdict)
print(original_state_dict)
print(mlp.state_dict())
I obtain the following results:
why does the original state dict does not store the gradients but when stepping through the optimizer it is correctly updated?