What’s the difference between Tensor.clone.detach()
and tensor.detach()
? Since detach returns the a detached version of tensor, what is the point of cloning?
When the clone
method is used, torch allocates a new memory to the returned variable but using the detach
method, the same memory address is used.
Compare the following code:
import torch
device = torch.device("cuda")
a = torch.randn([10000, 10000])
a = a.to(device)
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
b = a.detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
c = a.clone().detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
0.4 GB
0.4 GB
0.7 GB
to this code:
import torch
device = torch.device("cuda")
a = torch.randn([10000, 10000])
a = a.to(device)
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
b = a.clone().detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
c = a.detach()
print(round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
0.4 GB
0.7 GB
0.7 GB
4 Likes
Does that mean that any modifications made to the detached tensor also occur to the attached version?
2 Likes
That’s true. Following the same example:
import torch
device = torch.device("cuda")
a = torch.randn([2])
a = a.to(device)
print(a)
b = a.detach()
print(b)
c = a.clone().detach()
print(c)
b[0] = 1.
print(a)
print(c)
tensor([ 0.2042, -1.8436], device='cuda:0')
tensor([ 0.2042, -1.8436], device='cuda:0')
tensor([ 0.2042, -1.8436], device='cuda:0')
tensor([ 1.0000, -1.8436], device='cuda:0')
tensor([ 0.2042, -1.8436], device='cuda:0')
2 Likes
Apart from the difference in memory, do they share the same properties such as the value, requires_grad=False, and all other properties?
1 Like