Copy.deepcopy() vs clone()

when copying modules/tensors around, which one should I use?
are they interchangable?
Thanks a lot


Hi @Shisho_Sama,

For Tensors in most cases, you should go for clone since this is a PyTorch operation that will be recorded by autograd.

>>> t = torch.rand(1, requires_grad=True)
>>> t.clone()
tensor([0.4847], grad_fn=<CloneBackward>) # <=== as you can see here

When it comes to Module, there is no clone method available so you can either use copy.deepcopy or create a new instance of the model and just copy the parameters, as proposed in this post Deep copying PyTorch modules.


Hi, Thanks a lot.
So this means, when I do clone() on tensors, their clones will still be on the graph and any operations on them will be reflected in the graph right? for example changing the values or attributes will also change the original tensor as well? or affect the graph computation when doing backward pass?
In the case of the following two code snippets, what does happen in each case?
I think, deepcopy disregards any graph related information and just copies the data
as if it is a simple object while the clone, will create a new tensor which any operations on it will be reflected in the graph and in order to prevent this I need to use detach as well. am I right?

weights_encoder = sae_model.encoder[0] 
weights_decoder = sae_model.decoder[0]


weights_encoder = copy.deepcopy(sae_model.encoder[0] 
weights_decoder = copy.deepcopy(sae_model.decoder[0]

When you use .data, you get a new Tensor with requires_grad=False, so cloning it won’t involve autograd. So both are equivalent, but there might be a (small) speed difference, I am not sure about that.

Another use case could is when you want to clone/copy a non-parameter Tensor without autograd. You should use .detach() (and not data) before cloning:

>>> t = torch.rand(1, requires_grad=True)
>>> t.detach().clone()

Thank you very much. I really appreciate it :slight_smile:

1 Like

Is there any difference with t.clone().detach()?


Yes there is. Though both methods create same outcomes, however, t.clone().detach() is less efficient. The t.clone() with create a copy that attaches to the graph, then it will create another copy (detach()). So there will be more redundant.


I never understood this. Why would one ever want to have clone be in the computation graph? It’s just the identity!

When I make a copy of something I usually expect a brand new object, with new memory allocation and new instance of the object class it belongs. Not just copying pointers/references around. Can you clarify?

Answered here and here.


Let me see if I understand (it seems the accepted answer here is outdated, .data is not in the library or going to be removed according to what I’ve read in other answers with from albanD).

.clone() produces a new tensor instance with a new memory allocation to the tensor data. In addition it remembers the history of the original tensor and is connected to the earlier graph and appears as CloenBackward. The main advantage it seems is that its safer wrt in-place ops afaik.
deepcopy make a deep copy of the original tensor meaning it creates a new tensor instance with a new memory allocation to the tensor data (it definitively does this part correctly from my tests). I assume it also does a complete copy of the history too, either pointing to the old history or create a brand new deep copy history. I’m unsure how to test this but I believe if it is to behave as a proper deep copy method then it should create a new history that is a mirror of the earlier (instead of just pointing to it).

Test I did wrt memory allocation:

def clone_vs_deepcopy():
    import copy
    import torch

    x = torch.tensor([1,2,3.])
    x_clone = x.clone()
    x_deep_copy = copy.deepcopy(x)
    print(f'x = {x}')
    print(f'x_clone = {x_clone}')
    print(f'x_deep_copy = {x_deep_copy}')


x = tensor([-1., -2., -3.])
x_clone = tensor([1., 2., 3.])
x_deep_copy = tensor([1., 2., 3.])

since neither changed it must be a different memory. I just realized I could have checked it with id or something…alas.

I am still seeking clarification on the history part. Is it a deep copy of that or a pointer copy if we use deep copy?

I know for know for clone it is a pointer copy to the original history and not a complete deep copy.


1 Like

The history will not be copied, as you cannot call copy.deepcopy on a non-leaf tensor:

x = torch.randn(1, requires_grad=True)
y = x + 1
> RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

While y will be attached to the computation graph and will have a valid .grad_fn, you can only copy leaves as stated in the error message.

If you want to keep the history, use .clone(), otherwise .detach() the tensor additionally to the clone() call.


I always wondered why that error appeared!

Whats the choice for that semantics?

I don’t know, why the deepcopy isn’t supported (also wasn’t supported on Variables) and my best guess is that clone() or detach().clone() are a valid workaround and are also more explicit.

1 Like

who would know why it’s not supported?

Hi! This thread has been extremely useful but I ran into an issue with using copy.deepcopy() on a model inherited from pl.LightningModule where "_trainer" attribute was not None. It would throw the following error:

AttributeError: 'MyModel' object has no attribute '_parameters'

I noticed that setting it to None, running copy.deepcopy() and then resetting it would solve this issue. Is this an issue caused by PyTorch Lightning?

This issue might be specific to Lightning and seems to be related to this one.

Got it. Will follow up on this discussion in the github issue thread! Thanks for your help. :slight_smile: