Computing gradients of cloned parameters

In the following code I clone a state dict of my model parameters and load the clone into the model.
I would like to perform the autograd of the model with regards of the original state dict.
I have created the following code to test it:

import torch
from collections import OrderedDict
from torch import optim
import itertools

    

if __name__ == "__main__":
    #Initialising the model
    mlp = torch.nn.Sequential(torch.nn.Linear(1,2,bias=False))
    mlp[0].weight=torch.nn.Parameter(torch.tensor([1.0,2.0], requires_grad = True))

    # Taking storing dict of the model
    original_state_dict = OrderedDict([(k,v) for (k,v) in mlp.state_dict().items()])
    for item in original_state_dict.items():
        item[1].requires_grad= True

    # loading clone of dict of model
    mlp_bis_sttdict = OrderedDict([(k,torch.clone(v)) for (k,v) in original_state_dict.items()])
    mlp.load_state_dict(mlp_bis_sttdict)
    print(next(mlp.parameters()))


    # inference of model without stepping the parameters
    x=torch.tensor([10.0,100.0],requires_grad=True)
    optimizer = optim.Adam(
            mlp.parameters(),
            0.5)
    y=mlp(x)
    y.backward() # to change with grad = torch.autograd.grad(y, original_state_dict.values())

    

    # displaying grad
    print("## No step")
    print(list(iter(original_state_dict.values()))[0].grad)
    print(list(iter(mlp_bis_sttdict.values()))[0].grad)
    print(next(mlp.parameters()).grad)

    # stepping and displaying parameter changes
    print("## Step")
    optimizer.step()
    print(mlp_bis_sttdict)
    print(original_state_dict)
    print(mlp.state_dict())

I obtain the following results:

why does the original state dict does not store the gradients but when stepping through the optimizer it is correctly updated?

I think I found an answer to my question:

WHen loading the graph, we don’t load the computational graph. Therefore the gradients are cannot be backpropagated to the dict object loaded into it.