I found some issue in cuda memory allocation when I follow the tutorial official website guideline
device = torch.device("cuda:0")
x = torch.ones(2, 2, requires_grad=True, device =device)
print(x)
In this situation, the memory in gpu:0 is 863MB only when I create a 2 by 2 tensor array x
.
tensor([[1., 1.],
[1., 1.]], device='cuda:0', requires_grad=True)
so, I tried to add another tensor y
to watch the change of memory
y = x + 2
print(y)
But the memory in gpu:0 is still 863MB. Here you can see it really track the computation history in grad_fn
tensor([[3., 3.],
[3., 3.]], device='cuda:0', grad_fn=<AddBackward0>)
and I also try to test with torch.no_grad():
mention in https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html because it can prevent tracking history and using memory.
with torch.no_grad():
z = y * y * 3
out = z.mean()
print(z)
here we can see the grad_fn
in z
disappear, so it would save the memory in gpu:0 theoretically. In this situation, the memory is still 863MB. Therefore, I can’t watch whether it would save memory exactly or not.
tensor([[27., 27.],
[27., 27.]], device='cuda:0')
In conclusion,
1. What is the cuda memory allocation method in pytorch? Will it cache some memory for future usage? If so, how can I remove this mechanism? I don’t want to build a small model but take a lot of memory in future.
2. Do with torch.no_grad():
really prevent using memory? I can’t watch whether it would save memory or not in this situation.