Memory issues in computer vision


I’m running a training loop consisting of around 1700 images of 512x512 resolution. The batch size is 8. I have 15 GB of GPU memory, and yet, I cannot use a higher batch size than 8 since I will get an out of memory error.

Is there something I’m doing wrong in the code or do those numbers look right?

The images will take a small portion of the device memory and the majority will be used by the actual model training, i.e. parameters, forward activations, gradients, etc.
The general memory requirement thus depends on the actual model as well as the input shape.