`memory_stats` reports extremely high memory usage. How to interpret?

ege-erdogan · February 27, 2023, 12:29pm

Printing memory_stats, I get the following output reporting 300+ GB of memory usage, but the code successfully runs and the system obviously does not have that much memory. How should I interpret this value and get an actual estimate of the memory required for my program?

Thanks in advance for any answers!

ptrblck · February 28, 2023, 1:12am

I assume you are referring to the total allocations and freed tensors?
If so, note that these stats accumulate the memory allocations as seen in this small example:

x = torch.randn(1024, device="cuda") # 4kB
print(torch.cuda.memory_summary())
# |===========================================================================|
# |                  PyTorch CUDA memory summary, device ID 0                 |
# |---------------------------------------------------------------------------|
# |            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
# |===========================================================================|
# |        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
# |---------------------------------------------------------------------------|
# | Allocated memory      |   4096 B   |   4096 B   |   4096 B   |      0 B   |
# |       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
# |       from small pool |   4096 B   |   4096 B   |   4096 B   |      0 B   |
# |---------------------------------------------------------------------------|
# | Active memory         |   4096 B   |   4096 B   |   4096 B   |      0 B   |
# |       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
# |       from small pool |   4096 B   |   4096 B   |   4096 B   |      0 B   |
# |---------------------------------------------------------------------------|
# | Requested memory      |   4096 B   |   4096 B   |   4096 B   |      0 B   |
# |       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
# |       from small pool |   4096 B   |   4096 B   |   4096 B   |      0 B   |
# |---------------------------------------------------------------------------|
# | GPU reserved memory   |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |
# |       from large pool |      0 KiB |      0 KiB |      0 KiB |      0 B   |
# |       from small pool |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |
# |---------------------------------------------------------------------------|
# | Non-releasable memory |   2044 KiB |   2044 KiB |   2044 KiB |      0 B   |
# |       from large pool |      0 KiB |      0 KiB |      0 KiB |      0 B   |
# |       from small pool |   2044 KiB |   2044 KiB |   2044 KiB |      0 B   |
# |---------------------------------------------------------------------------|
# | Allocations           |       1    |       1    |       1    |       0    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       1    |       1    |       0    |
# |---------------------------------------------------------------------------|
# | Active allocs         |       1    |       1    |       1    |       0    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       1    |       1    |       0    |
# |---------------------------------------------------------------------------|
# | GPU reserved segments |       1    |       1    |       1    |       0    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       1    |       1    |       0    |
# |---------------------------------------------------------------------------|
# | Non-releasable allocs |       1    |       1    |       1    |       0    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       1    |       1    |       0    |
# |---------------------------------------------------------------------------|
# | Oversize allocations  |       0    |       0    |       0    |       0    |
# |---------------------------------------------------------------------------|
# | Oversize GPU segments |       0    |       0    |       0    |       0    |
# |===========================================================================|


for _ in range(1024):
    x = torch.randn(1024, device="cuda") 
print(torch.cuda.memory_summary())
# |===========================================================================|
# |                  PyTorch CUDA memory summary, device ID 0                 |
# |---------------------------------------------------------------------------|
# |            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
# |===========================================================================|
# |        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
# |---------------------------------------------------------------------------|
# | Allocated memory      |   4096 B   |   8192 B   |   4100 KiB |   4096 KiB |
# |       from large pool |      0 B   |      0 B   |      0 KiB |      0 KiB |
# |       from small pool |   4096 B   |   8192 B   |   4100 KiB |   4096 KiB |
# |---------------------------------------------------------------------------|
# | Active memory         |   4096 B   |   8192 B   |   4100 KiB |   4096 KiB |
# |       from large pool |      0 B   |      0 B   |      0 KiB |      0 KiB |
# |       from small pool |   4096 B   |   8192 B   |   4100 KiB |   4096 KiB |
# |---------------------------------------------------------------------------|
# | Requested memory      |   4096 B   |   8192 B   |   4100 KiB |   4096 KiB |
# |       from large pool |      0 B   |      0 B   |      0 KiB |      0 KiB |
# |       from small pool |   4096 B   |   8192 B   |   4100 KiB |   4096 KiB |
# |---------------------------------------------------------------------------|
# | GPU reserved memory   |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |
# |       from large pool |      0 KiB |      0 KiB |      0 KiB |      0 B   |
# |       from small pool |   2048 KiB |   2048 KiB |   2048 KiB |      0 B   |
# |---------------------------------------------------------------------------|
# | Non-releasable memory |   2044 KiB |   2044 KiB |   6140 KiB |   4096 KiB |
# |       from large pool |      0 KiB |      0 KiB |      0 KiB |      0 KiB |
# |       from small pool |   2044 KiB |   2044 KiB |   6140 KiB |   4096 KiB |
# |---------------------------------------------------------------------------|
# | Allocations           |       1    |       2    |    1025    |    1024    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       2    |    1025    |    1024    |
# |---------------------------------------------------------------------------|
# | Active allocs         |       1    |       2    |    1025    |    1024    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       2    |    1025    |    1024    |
# |---------------------------------------------------------------------------|
# | GPU reserved segments |       1    |       1    |       1    |       0    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       1    |       1    |       0    |
# |---------------------------------------------------------------------------|
# | Non-releasable allocs |       1    |       2    |     513    |     512    |
# |       from large pool |       0    |       0    |       0    |       0    |
# |       from small pool |       1    |       2    |     513    |     512    |
# |---------------------------------------------------------------------------|
# | Oversize allocations  |       0    |       0    |       0    |       0    |
# |---------------------------------------------------------------------------|
# | Oversize GPU segments |       0    |       0    |       0    |       0    |
# |===========================================================================|

Your currently allocated memory can be seen in Cur Usage - Allocated memory and the cached memory of Cur Usage - GPU reserved memory.