Tensor.clone.detach() vs tensor.detach()?

What’s the difference between Tensor.clone.detach() and tensor.detach()? Since detach returns the a detached version of tensor, what is the point of cloning?

When the clone method is used, torch allocates a new memory to the returned variable but using the detach method, the same memory address is used.

Compare the following code:

import torch
device = torch.device("cuda")
a = torch.randn([10000, 10000])
a = a.to(device)
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
b = a.detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
c = a.clone().detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
0.4 GB
0.4 GB
0.7 GB

to this code:

import torch
device = torch.device("cuda")
a = torch.randn([10000, 10000])
a = a.to(device)
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
b = a.clone().detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
c = a.detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
0.4 GB
0.7 GB
0.7 GB
3 Likes

Does that mean that any modifications made to the detached tensor also occur to the attached version?

2 Likes

That’s true. Following the same example:

import torch
device = torch.device("cuda")
a = torch.randn([2])
a = a.to(device)
print(a)
b = a.detach()
print(b)
c = a.clone().detach()
print(c)
b[0] = 1.
print(a)
print(c)
tensor([ 0.2042, -1.8436], device='cuda:0')
tensor([ 0.2042, -1.8436], device='cuda:0')
tensor([ 0.2042, -1.8436], device='cuda:0')
tensor([ 1.0000, -1.8436], device='cuda:0')
tensor([ 0.2042, -1.8436], device='cuda:0')
2 Likes

Apart from the difference in memory, do they share the same properties such as the value, requires_grad=False, and all other properties?

1 Like