Hi,
I’m trying to investigate the reason for a high GPU memory usage in my code.
For that, I would like to list all allocated tensors/storages created explicitly or within autograd. The closest thing I found is Soumith’s snippet to iterate over all tensors known to the garbage collector.
However, there has to be something missing… For example, I run python -m pdb -c continue
to break at a cuda out of memory error (with or without CUDA_LAUNCH_BLOCKING=1
). At this time, nvidia-smi
reports around 9 GB being occupied. In the snipped I sum .numel()
s of all tensors found and I get 17092783 elements, which with max size of 8 B per element gives ~130 MB. In the list, I find especially many autograd Variables (intermediate computations) missing. Can anyone give me a hint? Thanks!