I am training a model on Google Colab.
Here is a snippet of code from the training pipeline.
for epoch in range(num_epoches):
for index, batched_data in enumerate(dataloder):
inputs = {}
for key, val in batched_data.items():
inputs[key] = torch.squeeze(val, dim = 0)
my_optimizer.zero_grad()
output = model(inputs)
loss = model.loss(inputs)
if index % 10 == 0:
print("index {}, loss: {}".format(index, loss))
print(torch.cuda.memory_summary())
loss.backward()
print(loss.item())
my_optimizer.step()
my_lr_scheduler.step()
One of memory summary.
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 2202 MB | 2230 MB | 1002 GB | 999 GB |
| from large pool | 2009 MB | 2036 MB | 922 GB | 920 GB |
| from small pool | 193 MB | 194 MB | 79 GB | 79 GB |
|---------------------------------------------------------------------------|
| Active memory | 2202 MB | 2230 MB | 1002 GB | 999 GB |
| from large pool | 2009 MB | 2036 MB | 922 GB | 920 GB |
| from small pool | 193 MB | 194 MB | 79 GB | 79 GB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 2378 MB | 2378 MB | 2378 MB | 0 B |
| from large pool | 2160 MB | 2160 MB | 2160 MB | 0 B |
| from small pool | 218 MB | 218 MB | 218 MB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 177235 KB | 1088 MB | 897 GB | 897 GB |
| from large pool | 154040 KB | 987 MB | 811 GB | 811 GB |
| from small pool | 23195 KB | 103 MB | 85 GB | 85 GB |
|---------------------------------------------------------------------------|
| Allocations | 1735 | 1746 | 582346 | 580611 |
| from large pool | 310 | 316 | 166280 | 165970 |
| from small pool | 1425 | 1435 | 416066 | 414641 |
|---------------------------------------------------------------------------|
| Active allocs | 1735 | 1746 | 582346 | 580611 |
| from large pool | 310 | 316 | 166280 | 165970 |
| from small pool | 1425 | 1435 | 416066 | 414641 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 226 | 226 | 226 | 0 |
| from large pool | 117 | 117 | 117 | 0 |
| from small pool | 109 | 109 | 109 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 170 | 305 | 349860 | 349690 |
| from large pool | 61 | 132 | 107260 | 107199 |
| from small pool | 109 | 179 | 242600 | 242491 |
|===========================================================================|
The Tot Alloc
and Tot Freed
increased after each batch, and the program paused after 2 epochs.
What do the terms ‘Tot Alloc’ and ‘Tot Freed’ mean?
Why the program paused after some epochs?