The Tot Alloc and Toc Freed increased after each batch

I am training a model on Google Colab.
Here is a snippet of code from the training pipeline.

    for epoch in range(num_epoches):
        for index, batched_data in enumerate(dataloder):
            inputs = {}
            for key, val in batched_data.items():
                inputs[key] = torch.squeeze(val, dim = 0)            
            my_optimizer.zero_grad()
            output = model(inputs)
            loss = model.loss(inputs)
            if index % 10 == 0:
                print("index {}, loss: {}".format(index, loss))
                print(torch.cuda.memory_summary())
            loss.backward()
            print(loss.item())
            my_optimizer.step()
        my_lr_scheduler.step()

One of memory summary.

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    2202 MB |    2230 MB |    1002 GB |     999 GB |
|       from large pool |    2009 MB |    2036 MB |     922 GB |     920 GB |
|       from small pool |     193 MB |     194 MB |      79 GB |      79 GB |
|---------------------------------------------------------------------------|
| Active memory         |    2202 MB |    2230 MB |    1002 GB |     999 GB |
|       from large pool |    2009 MB |    2036 MB |     922 GB |     920 GB |
|       from small pool |     193 MB |     194 MB |      79 GB |      79 GB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    2378 MB |    2378 MB |    2378 MB |       0 B  |
|       from large pool |    2160 MB |    2160 MB |    2160 MB |       0 B  |
|       from small pool |     218 MB |     218 MB |     218 MB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |  177235 KB |    1088 MB |     897 GB |     897 GB |
|       from large pool |  154040 KB |     987 MB |     811 GB |     811 GB |
|       from small pool |   23195 KB |     103 MB |      85 GB |      85 GB |
|---------------------------------------------------------------------------|
| Allocations           |    1735    |    1746    |  582346    |  580611    |
|       from large pool |     310    |     316    |  166280    |  165970    |
|       from small pool |    1425    |    1435    |  416066    |  414641    |
|---------------------------------------------------------------------------|
| Active allocs         |    1735    |    1746    |  582346    |  580611    |
|       from large pool |     310    |     316    |  166280    |  165970    |
|       from small pool |    1425    |    1435    |  416066    |  414641    |
|---------------------------------------------------------------------------|
| GPU reserved segments |     226    |     226    |     226    |       0    |
|       from large pool |     117    |     117    |     117    |       0    |
|       from small pool |     109    |     109    |     109    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |     170    |     305    |  349860    |  349690    |
|       from large pool |      61    |     132    |  107260    |  107199    |
|       from small pool |     109    |     179    |  242600    |  242491    |
|===========================================================================|

The Tot Alloc and Tot Freed increased after each batch, and the program paused after 2 epochs.
What do the terms ‘Tot Alloc’ and ‘Tot Freed’ mean?
Why the program paused after some epochs?

Tot Alloc and Tot Freed shows the memory size of total allocations and freed memory during the current script execution.
E.g. in this code snippet I’m recreating a single tensor, which gets deleted once the script returns from f():

def f():
    x = torch.randn(1024, device='cuda')

print(torch.cuda.memory_summary())
for _ in range(1000):
    f()
print(torch.cuda.memory_summary())

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |    4096 B  |    4000 KB |    4000 KB |
|       from large pool |       0 B  |       0 B  |       0 KB |       0 KB |
|       from small pool |       0 B  |    4096 B  |    4000 KB |    4000 KB |
|---------------------------------------------------------------------------|

As you can see, no memory is currently used and the Tot counters show the 1000 * 4KB allocations and deletions.

1 Like

Thanks for the demo. I got the meaning of Tot Alloc and Tot Freed.
The GPU memory may not be the cause of the program’s pause.