Torch tensor memory size discrepancy

Hi! So I’m using " watch -n 0.1 nvidia-smi " to try to debug a memory leak im having. I set a pdb.set_trace() at the very beginning of my main file. I then did:

test = torch.zeros((4,400,400))
test = test.to(1)

I then print:

torch.cuda.memory_allocated(device=1)
and get:
2560000

All good so far. However, suddenly my nvidia-gpu watch displays gpu #2 (ignore the fact that its not 1, because theres a mismatch between torch gpu #s and nvidia), except it now shows around 550MiB of memory being used!!!

I then do
test = torch.zeros((4,400,400))
test2 = test2.to(1)

So again, I print:
torch.cuda.memory_allocated(device=1)
and get:
2880000

But this time, my nvidia-watch thing doesn’t change at all!
Clearly, the torch.cuda.memory_allocated is correct, but I’m not sure whats going on with my nvidia watch… This is a problem because due to this memory leak i cant run my code

Thanks!

The difference is mainly caused by cuda runtime, that eats hundreds of megabytes once (lazily) initialized; that’s mainly because these is a lot of cuda kernels in torch library. Also, torch memory allocator allocates memory in bigger blocks, so you should also consider cached memory when tracking usage (try torch.cuda.memory_summary()).

1 Like