PyTorch's `torch.cuda.max_memory_allocated()` showing different results from `nvidia-smi`?

I’m currently making my own custom GPU report and am using torch.cuda.max_memory_allocated(device_id) to get the maximum memory that each GPU uses. However, I’ve noticed that this number is different from when I run nvidia-smi during the process.

According to the documentation for torch.cuda.max_memory_allocated, the output integer is in the form of bytes. And from what I’ve searched online to convert the number of bytes to the number of gigabytes, you should divide it by 1024 ** 3. I’m currently doing round(max_mem / (1024 ** 3), 2)

Am I doing the calculation wrong, or am I misunderstanding how torch.cuda.max_memory_allocated works entirely? The memory allocated I’ve observed from one GPU during the entire process was aroudn 32GB, but torch.cuda.max_memory_allocated(0) / (1024 ** 3) returns aroudn 13.5GB.

It’s not expected for these two values to match for several reasons. Because the current CUDA memory allocator is a caching allocator, more memory than is currently occupied by tensors (reported by torch.cuda.max_memory_allocated) will be “reserved” to speed up future allocations/reclaim unused memory from garbage collected tensors. The reserved memory torch.cuda.max_memory_reserved — PyTorch 1.13 documentation will be closer to what is reported by nvidia-smi, but keep in mind that there will be some additional overhead depending on what operations/libraries are being used as e.g., (cuDNN, cuBLAS, etc. kernels will use memory). The latter can be substantial (on the order of hundreds of MiB) in many cases.

1 Like