I’m trying to investigate the reason for a high GPU memory usage in my code.
For that, I would like to list all allocated tensors/storages created explicitly or within autograd. The closest thing I found is Soumith’s snippet to iterate over all tensors known to the garbage collector.
However, there has to be something missing… For example, I run
python -m pdb -c continue to break at a cuda out of memory error (with or without
CUDA_LAUNCH_BLOCKING=1). At this time,
nvidia-smi reports around 9 GB being occupied. In the snipped I sum
.numel()s of all tensors found and I get 17092783 elements, which with max size of 8 B per element gives ~130 MB. In the list, I find especially many autograd Variables (intermediate computations) missing. Can anyone give me a hint? Thanks!