Printing memory_stats, I get the following output reporting 300+ GB of memory usage, but the code successfully runs and the system obviously does not have that much memory. How should I interpret this value and get an actual estimate of the memory required for my program?
I assume you are referring to the total allocations and freed tensors?
If so, note that these stats accumulate the memory allocations as seen in this small example:
x = torch.randn(1024, device="cuda") # 4kB
print(torch.cuda.memory_summary())
# |===========================================================================|
# | PyTorch CUDA memory summary, device ID 0 |
# |---------------------------------------------------------------------------|
# | CUDA OOMs: 0 | cudaMalloc retries: 0 |
# |===========================================================================|
# | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
# |---------------------------------------------------------------------------|
# | Allocated memory | 4096 B | 4096 B | 4096 B | 0 B |
# | from large pool | 0 B | 0 B | 0 B | 0 B |
# | from small pool | 4096 B | 4096 B | 4096 B | 0 B |
# |---------------------------------------------------------------------------|
# | Active memory | 4096 B | 4096 B | 4096 B | 0 B |
# | from large pool | 0 B | 0 B | 0 B | 0 B |
# | from small pool | 4096 B | 4096 B | 4096 B | 0 B |
# |---------------------------------------------------------------------------|
# | Requested memory | 4096 B | 4096 B | 4096 B | 0 B |
# | from large pool | 0 B | 0 B | 0 B | 0 B |
# | from small pool | 4096 B | 4096 B | 4096 B | 0 B |
# |---------------------------------------------------------------------------|
# | GPU reserved memory | 2048 KiB | 2048 KiB | 2048 KiB | 0 B |
# | from large pool | 0 KiB | 0 KiB | 0 KiB | 0 B |
# | from small pool | 2048 KiB | 2048 KiB | 2048 KiB | 0 B |
# |---------------------------------------------------------------------------|
# | Non-releasable memory | 2044 KiB | 2044 KiB | 2044 KiB | 0 B |
# | from large pool | 0 KiB | 0 KiB | 0 KiB | 0 B |
# | from small pool | 2044 KiB | 2044 KiB | 2044 KiB | 0 B |
# |---------------------------------------------------------------------------|
# | Allocations | 1 | 1 | 1 | 0 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 1 | 1 | 0 |
# |---------------------------------------------------------------------------|
# | Active allocs | 1 | 1 | 1 | 0 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 1 | 1 | 0 |
# |---------------------------------------------------------------------------|
# | GPU reserved segments | 1 | 1 | 1 | 0 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 1 | 1 | 0 |
# |---------------------------------------------------------------------------|
# | Non-releasable allocs | 1 | 1 | 1 | 0 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 1 | 1 | 0 |
# |---------------------------------------------------------------------------|
# | Oversize allocations | 0 | 0 | 0 | 0 |
# |---------------------------------------------------------------------------|
# | Oversize GPU segments | 0 | 0 | 0 | 0 |
# |===========================================================================|
for _ in range(1024):
x = torch.randn(1024, device="cuda")
print(torch.cuda.memory_summary())
# |===========================================================================|
# | PyTorch CUDA memory summary, device ID 0 |
# |---------------------------------------------------------------------------|
# | CUDA OOMs: 0 | cudaMalloc retries: 0 |
# |===========================================================================|
# | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
# |---------------------------------------------------------------------------|
# | Allocated memory | 4096 B | 8192 B | 4100 KiB | 4096 KiB |
# | from large pool | 0 B | 0 B | 0 KiB | 0 KiB |
# | from small pool | 4096 B | 8192 B | 4100 KiB | 4096 KiB |
# |---------------------------------------------------------------------------|
# | Active memory | 4096 B | 8192 B | 4100 KiB | 4096 KiB |
# | from large pool | 0 B | 0 B | 0 KiB | 0 KiB |
# | from small pool | 4096 B | 8192 B | 4100 KiB | 4096 KiB |
# |---------------------------------------------------------------------------|
# | Requested memory | 4096 B | 8192 B | 4100 KiB | 4096 KiB |
# | from large pool | 0 B | 0 B | 0 KiB | 0 KiB |
# | from small pool | 4096 B | 8192 B | 4100 KiB | 4096 KiB |
# |---------------------------------------------------------------------------|
# | GPU reserved memory | 2048 KiB | 2048 KiB | 2048 KiB | 0 B |
# | from large pool | 0 KiB | 0 KiB | 0 KiB | 0 B |
# | from small pool | 2048 KiB | 2048 KiB | 2048 KiB | 0 B |
# |---------------------------------------------------------------------------|
# | Non-releasable memory | 2044 KiB | 2044 KiB | 6140 KiB | 4096 KiB |
# | from large pool | 0 KiB | 0 KiB | 0 KiB | 0 KiB |
# | from small pool | 2044 KiB | 2044 KiB | 6140 KiB | 4096 KiB |
# |---------------------------------------------------------------------------|
# | Allocations | 1 | 2 | 1025 | 1024 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 2 | 1025 | 1024 |
# |---------------------------------------------------------------------------|
# | Active allocs | 1 | 2 | 1025 | 1024 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 2 | 1025 | 1024 |
# |---------------------------------------------------------------------------|
# | GPU reserved segments | 1 | 1 | 1 | 0 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 1 | 1 | 0 |
# |---------------------------------------------------------------------------|
# | Non-releasable allocs | 1 | 2 | 513 | 512 |
# | from large pool | 0 | 0 | 0 | 0 |
# | from small pool | 1 | 2 | 513 | 512 |
# |---------------------------------------------------------------------------|
# | Oversize allocations | 0 | 0 | 0 | 0 |
# |---------------------------------------------------------------------------|
# | Oversize GPU segments | 0 | 0 | 0 | 0 |
# |===========================================================================|
Your currently allocated memory can be seen in Cur Usage - Allocated memory and the cached memory of Cur Usage - GPU reserved memory.