PyTorch's `torch.cuda.max_memory_allocated()` showing different results from `nvidia-smi`?

It’s not expected for these two values to match for several reasons. Because the current CUDA memory allocator is a caching allocator, more memory than is currently occupied by tensors (reported by torch.cuda.max_memory_allocated) will be “reserved” to speed up future allocations/reclaim unused memory from garbage collected tensors. The reserved memory torch.cuda.max_memory_reserved — PyTorch 1.13 documentation will be closer to what is reported by nvidia-smi, but keep in mind that there will be some additional overhead depending on what operations/libraries are being used as e.g., (cuDNN, cuBLAS, etc. kernels will use memory). The latter can be substantial (on the order of hundreds of MiB) in many cases.

2 Likes