How can I handle the Cuda out of memory error?

Hi, I am trying to run the ResNet18 model using GTX 1080 GPU but the Cuda out-of-memory error arises. I reduced the baches size to 4 and use

    learn=0
    gc.collect()
    torch.cuda.empty_cache()

but they didn’t work.
could anyone give me some advice?

Clearing the cache wouldn’t avoid the OOM issue and could just slow down your code, so you would either need to reduce the batch size more, lower the memory usage of the model (e.g. less/smaller layers), reduce the spatial size of the input, or use torch.utils.checkpoint to trade compute for memory.

1 Like

what kind of help torch.utils.checkpoint can do?

could you please give me an example code that shows how torch.utils.checkpoint is used about my problem?

1080 is more than enough to handle batch size of 64 with ease on 1080. It is relatively small network.

Look for cuda memory leak somewhere else , maybe you keep referencing/copying some tensors you don’t need etc.

It seems to be reasonable. How can I check it? Is there any toolbox or method?