How to fix the GPU memory allocation with changeable training size?

c_cj · November 12, 2021, 4:08pm

I have a task, for one epoch the image size is raised from 256 to 512. However, the GPU memory allocation is very unstable during the training with a changeable size. For example, if the image size change from 256 to 264, the costed GPU memory will dramatically change from 9GB to 20GB and back to 10GB. And the training speed is very slow. Although my model can be trained in 512x512 size, it also causes OOM by the unstable memory allocation. How to fix the GPU memory allocation with the largest image size in pytorch?

ptrblck · November 13, 2021, 10:45am

For variable input shapes I would recommend to start with the largest one and reduce the size if this would be possible. This would allow you to reuse the freed memory without running into memory fragmentation issues.

c_cj · November 19, 2021, 12:30pm

Thanks. It is useful!