After 4 epochs I am getting error CUDA out of Memory
I am using Wav2Vec2 HuggingFace Model with PyTorch Training Setup
Cuda Memory Summary Initially
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 369906 KB | 369906 KB | 369906 KB | 0 B |
| from large pool | 368384 KB | 368384 KB | 368384 KB | 0 B |
| from small pool | 1522 KB | 1522 KB | 1522 KB | 0 B |
|---------------------------------------------------------------------------|
| Active memory | 369906 KB | 369906 KB | 369906 KB | 0 B |
| from large pool | 368384 KB | 368384 KB | 368384 KB | 0 B |
| from small pool | 1522 KB | 1522 KB | 1522 KB | 0 B |
|---------------------------------------------------------------------------|
| GPU reserved memory | 409600 KB | 409600 KB | 409600 KB | 0 B |
| from large pool | 407552 KB | 407552 KB | 407552 KB | 0 B |
| from small pool | 2048 KB | 2048 KB | 2048 KB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 39694 KB | 50508 KB | 263679 KB | 223985 KB |
| from large pool | 39168 KB | 48896 KB | 261632 KB | 222464 KB |
| from small pool | 526 KB | 2047 KB | 2047 KB | 1521 KB |
|---------------------------------------------------------------------------|
| Allocations | 251 | 251 | 251 | 0 |
| from large pool | 80 | 80 | 80 | 0 |
| from small pool | 171 | 171 | 171 | 0 |
|---------------------------------------------------------------------------|
| Active allocs | 251 | 251 | 251 | 0 |
| from large pool | 80 | 80 | 80 | 0 |
| from small pool | 171 | 171 | 171 | 0 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 21 | 21 | 21 | 0 |
| from large pool | 20 | 20 | 20 | 0 |
| from small pool | 1 | 1 | 1 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 19 | 19 | 20 | 1 |
| from large pool | 18 | 18 | 19 | 1 |
| from small pool | 1 | 1 | 1 | 0 |
|---------------------------------------------------------------------------|
| Oversize allocations | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Oversize GPU segments | 0 | 0 | 0 | 0 |
|===========================================================================|
Cuda Memory Summary After Epoch 1.
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 2680 MB | 3158 MB | 3642 GB | 3639 GB |
| from large pool | 377 MB | 812 MB | 3440 GB | 3440 GB |
| from small pool | 2302 MB | 2346 MB | 201 GB | 199 GB |
|---------------------------------------------------------------------------|
| Active memory | 2680 MB | 3158 MB | 3642 GB | 3639 GB |
| from large pool | 377 MB | 812 MB | 3440 GB | 3440 GB |
| from small pool | 2302 MB | 2346 MB | 201 GB | 199 GB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 2772 MB | 3282 MB | 3282 MB | 522240 KB |
| from large pool | 418 MB | 882 MB | 882 MB | 475136 KB |
| from small pool | 2354 MB | 2400 MB | 2400 MB | 47104 KB |
|---------------------------------------------------------------------------|
| Non-releasable memory | 93778 KB | 126483 KB | 3056 GB | 3056 GB |
| from large pool | 41216 KB | 70496 KB | 2826 GB | 2826 GB |
| from small pool | 52562 KB | 56630 KB | 229 GB | 229 GB |
|---------------------------------------------------------------------------|
| Allocations | 19128 | 19392 | 2592 K | 2573 K |
| from large pool | 81 | 198 | 1308 K | 1308 K |
| from small pool | 19047 | 19209 | 1284 K | 1265 K |
|---------------------------------------------------------------------------|
| Active allocs | 19128 | 19392 | 2592 K | 2573 K |
| from large pool | 81 | 198 | 1308 K | 1308 K |
| from small pool | 19047 | 19209 | 1284 K | 1265 K |
|---------------------------------------------------------------------------|
| GPU reserved segments | 1198 | 1238 | 1238 | 40 |
| from large pool | 21 | 38 | 38 | 17 |
| from small pool | 1177 | 1200 | 1200 | 23 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 1252 | 1258 | 1960 K | 1959 K |
| from large pool | 19 | 26 | 1064 K | 1064 K |
| from small pool | 1233 | 1239 | 896 K | 895 K |
|---------------------------------------------------------------------------|
| Oversize allocations | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Oversize GPU segments | 0 | 0 | 0 | 0 |
|===========================================================================|