How can I handle the Cuda out of memory error?

S_M · July 23, 2021, 4:42pm

Hi, I am trying to run the ResNet18 model using GTX 1080 GPU but the Cuda out-of-memory error arises. I reduced the baches size to 4 and use

    learn=0
    gc.collect()
    torch.cuda.empty_cache()

but they didn’t work.
could anyone give me some advice?

ptrblck · July 23, 2021, 10:04pm

Clearing the cache wouldn’t avoid the OOM issue and could just slow down your code, so you would either need to reduce the batch size more, lower the memory usage of the model (e.g. less/smaller layers), reduce the spatial size of the input, or use torch.utils.checkpoint to trade compute for memory.

S_M · July 25, 2021, 1:45pm

what kind of help torch.utils.checkpoint can do?

S_M · July 25, 2021, 3:22pm

could you please give me an example code that shows how torch.utils.checkpoint is used about my problem?

artyom-beilis · July 26, 2021, 3:12am

1080 is more than enough to handle batch size of 64 with ease on 1080. It is relatively small network.

Look for cuda memory leak somewhere else , maybe you keep referencing/copying some tensors you don’t need etc.

S_M · July 26, 2021, 4:31am

It seems to be reasonable. How can I check it? Is there any toolbox or method?