CUDA out of memory when calculating through the VGG layer

shirui-japina · June 21, 2020, 8:47am

I’m studying the modification of SSD model and referring here.

What puzzles me is that there will be a problem in the calculation process of the VGG layer.

train.py

...
for iteration in range(args.start_iter, cfg['max_iter']):
    ...
    out = net(images)
    ...

ssd.py

def forward(self, x):
    for k in range(23):
        x = self.vgg[k](x).detach()
        print(torch.cuda.memory_allocated() / 1024**2)

When batch_size is set to 16, the process of training the model can run normally.
Below is what it prints:

461.8271484375
461.8271484375
461.3896484375
461.3896484375
197.7177734375
285.6083984375
285.6083984375
285.6083984375
285.6083984375
153.7724609375
197.7177734375
197.7177734375
197.7177734375
197.7177734375
197.7177734375
197.7177734375
132.3896484375
154.9521484375
154.9521484375
154.9521484375
154.9521484375
154.9521484375
154.9521484375

When batch_size is set to 32, it will CUDA out of memory.
Below is what it prints:

830.306640625
830.306640625
Traceback (most recent call last):
…
RuntimeError: CUDA out of memory. Tried to allocate 3.09 GiB (GPU 0; 8.00 GiB total capacity; 1.50 GiB already allocated; 3.52 GiB free; 2.44 GiB reserved in total by PyTorch)

And even sometimes it will OOM when batch_size is set to 16,
Then it can run nomally after reboot.
Below is what it prints when OOM:

461.8349609375
461.8349609375
461.3974609375
461.3974609375
197.7255859375
285.6162109375
285.6162109375
285.6162109375
285.6162109375
153.7802734375
197.7255859375
197.7255859375
197.7255859375
197.7255859375
197.7255859375
197.7255859375
132.3974609375
Traceback (most recent call last):
…
RuntimeError: CUDA out of memory. Tried to allocate 4.26 GiB (GPU 0; 8.00 GiB total capacity; 177.52 MiB already allocated;
5.51 GiB free; 474.00 MiB reserved in total by PyTorch)

The questions I want to ask:

Why does the calculation of the VGG layer cause CUDA OOM?
This is NOT the process of inputting the image to the GPU.
Why it can loop for twice? (prints twice) when batch_size is set to 32.
Why OOM happens a bit randomly? is something not cleaned up in the GPU during the last run? (something that can only be cleaned by reboot?)
How to solve this problem?

OS: Windows 10
Python: 3.6.8
Pytorch: 1.5.1
nvcc: 10.2
GPU: NVIDIA GeForce RTX 2070 SUPER (8192MB)

Scott_Hoang · June 21, 2020, 4:15pm

CUDA OOM is simply when there isn’t enough RAM space on your video card to store all variables.
The variables are your model’s parameters, your data’s physical size, model’s graphs, optimizers’s variables, and various inner parameters PyTorch keeps, plus any unfreed parameters from the previous run.
So to answer your question in order:

calculation of the VGG generates temp parameters to be store in Cuda which can cause OOM.
Unfreed parameters from the previous run added up to cause OOM.
OOM happens randomly due to various reasons, but mostly due to a combination of unfreed parameters, creation of temporary variables for calculation, …etc

a)Reduce your physical batch size, and employ gradient accumulation in software to achieve higher virtual batch size.
b)Be vigilant on how you send data to Cuda. Define a fixed size torch Variable in Cuda and clone your data in CPU to this allocated space to reduce overhead and save space (but might make your model run a teeny bit slower)
c) Take advantage of rtx20xx tensor cores and use fp16 to double your batch size and halve your run time. Take a look at https://github.com/NVIDIA/apex.