Discrepancy between NVIDIA-smi and torch.cuda.memory_allocated()

NVIDIA-SMI prints 37 GB allocated

but torch.cuda.memory_allocated() prints 15GB allocated- im curious, why is there a discrepancy?

From the PyTorch docs torch.cuda.memory_allocated — PyTorch 2.1 documentation

This is likely less than the amount shown in nvidia-smi since some unused memory can be held by the caching allocator and some context needs to be created on GPU. See Memory management for more details about GPU memory management.

But TL;DR is torch.cuda.memory_allocated() is the amount of memory allocated to tensors whereas there’s also the CUDA context and cuda caching allocator which are being shown in nvidia-smi

1 Like