Interpreting the memory summary

udo · March 13, 2023, 12:20pm

I only have a laptop GPU: NVIDIA RTX™ A2000 4 GB GDDR6,
how come the memory used can be 175172 MB as printed below?


|===========================================================================|                                                  
|                  PyTorch CUDA memory summary, device ID 0                 |                                                  
|---------------------------------------------------------------------------|                                                  
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |                                                  
|===========================================================================|                                                  
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |                                                  
|---------------------------------------------------------------------------|                                                  
Allocated memory      |       0 B  |    1159 MB |  175172 MB |  175172 MB |                                                  
|       from large pool |       0 B  |    1158 MB |  170754 MB |  170754 MB |                                                  
|       from small pool |       0 B  |       4 MB |    4417 MB |    4417 MB |                                                  
|---------------------------------------------------------------------------|                                                  
| Active memory         |       0 B  |    1159 MB |  175172 MB |  175172 MB |                                                  
|       from large pool |       0 B  |    1158 MB |  170754 MB |  170754 MB |                                                  
|       from small pool |       0 B  |       4 MB |    4417 MB |    4417 MB |                                                  
|---------------------------------------------------------------------------|                                                  
| GPU reserved memory   |     876 MB |    1186 MB |    2736 MB |    1860 MB |                                                  
|       from large pool |     872 MB |    1180 MB |    2720 MB |    1848 MB |                                                  
|       from small pool |       4 MB |       6 MB |      16 MB |      12 MB |                                                  
|---------------------------------------------------------------------------|                                                  
| Non-releasable memory |       0 B  |  105782 KB |  104516 MB |  104516 MB |                                                  
|       from large pool |       0 B  |  102456 KB |   98883 MB |   98883 MB |                                                  
|       from small pool |       0 B  |    4054 KB |    5633 MB |    5633 MB |                                                  
|---------------------------------------------------------------------------|                                                  
| Allocations           |       0    |     264    |  155180    |  155180    |                                                  
|       from large pool |       0    |      33    |    8510    |    8510    |                                                  
|       from small pool |       0    |     232    |  146670    |  146670    |

ptrblck · March 14, 2023, 5:59am

Tot Alloc and Tot Freed refer to the “total” amount of memory which was allocated or freed by accumulating the stats.
The Cur (current) and Peak usage might be more interesting to you.

Here is a small example:

print(torch.cuda.memory_summary())
# |===========================================================================|
# |                  PyTorch CUDA memory summary, device ID 0                 |
# |---------------------------------------------------------------------------|
# |            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
# |===========================================================================|
# |        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
# |---------------------------------------------------------------------------|
# | Allocated memory      |      0 B   |      0 B   |      0 B   |      0 B   |
# |       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
# |       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
# |---------------------------------------------------------------------------|

# allocate a total of 1GB
for _ in range(256 * 1024):
    # x contains 4kB of data
    x = torch.randn(1024, device="cuda")
    
print(torch.cuda.memory_summary())
# |===========================================================================|
# |                  PyTorch CUDA memory summary, device ID 0                 |
# |---------------------------------------------------------------------------|
# |            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
# |===========================================================================|
# |        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
# |---------------------------------------------------------------------------|
# | Allocated memory      |   4096 B   |   8192 B   |   1024 MiB |   1023 MiB |
# |       from large pool |      0 B   |      0 B   |      0 MiB |      0 MiB |
# |       from small pool |   4096 B   |   8192 B   |   1024 MiB |   1023 MiB |
# |---------------------------------------------------------------------------|