Memory_allocated

>>> out = torch.randn(100,100)
>>> out.cuda()
>>> torch.cuda.memory_allocated(0)
40448
>>> out1 = torch.randn(100,100)
>>> torch.cuda.memory_allocated(0)
0

Why is this happening? I expected the output of totorch.cuda.memory_allocated(0) be 40448 still, because out1 is totally different to out .
Note that,

>>> id(out)
139989884735632
>>> id(out1)
139989884693456

Note2

>>> out = torch.randn(100,100)
>>> out.cuda()
>>> out1 = torch.randn(90,90)
>>> torch.cuda.memory_allocated(0)
40448

I cannot reproduce the issue and get:

out = torch.randn(100,100)
out.cuda()
print(torch.cuda.memory_allocated(0))
> 0

out1 = torch.randn(100,100)
print(torch.cuda.memory_allocated(0))
> 0

Are you using a REPL environment where the output of the last operation might still be stored in the _ object?

Yes I was using the REPL env. Is there a docs/blog posts explaining what you just said here : the output of the last operation be stored in the _ object?

Thanks in advance.

The underscore (_) is a special variable which will get the output of the last operation as seen here:

a = 5

a
Out[2]: 5

6
Out[3]: 6

_
Out[4]: 6

_ + _
Out[5]: 12

If you are stepping through the code in a REPL env I would guess that in particular this variable can hold a reference to the CUDATensor and could thus report the GPU usage. On the other hand, if you execute the entire script, the memory usage is shown as 0. which is expected since you didn’t assign the CUDATensor to any variable so it will be discarded.