How to clean GPU memory after a RuntimeError?

I try multiple model on my data.
Sometime I have the following RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorMath.cu:35

The model was to big, ok!

The problem is that when I try the next model (really small) I get the same RuntimeError even if i do del old_model before

witch mean that the gpu memory is not free even after a del

Do you have a way to recover from an cuda out of memory?

FairSeq is restoring the training and validation, if they run into OOM issues.
Have a look at this code.

Depending where the OOM error occurred, they just skip the batch or clear the cache.

2 Likes

Hi @ptrblck,
I’m having similar problem , Whenever i run my code i get “cuda out of memory” error …
I’ve checked FairSeq method but i couldn’t figure it out where to put and how to use his method !!

            try:
                _loss, sample_size, logging_output = self.task.valid_step(
                    sample, self.model, self.criterion
                )
            except RuntimeError as e:
                if 'out of memory' in str(e) and not raise_oom:
                    print('| WARNING: ran out of memory, retrying batch')
                    for p in self.model.parameters():
                        if p.grad is not None:
                            del p.grad  # free some memory
                    torch.cuda.empty_cache()
                    return self.valid_step(sample, raise_oom=True)
                else:
                    raise e

Where should i use Try ? when defining my Network ? Plz help .
Thanks.

You could wrap the forward and backward pass to free the memory if the current sequence was too long and you ran out of memory.
However, this code won’t magically work on all types of models, so if you encounter this issue on a model with a fixed size, you might just want to lower your batch size.

1 Like

I think the FairSeq code doesn’t work in your case, as your input shape seems to be static and you already run into OOM issues using batch_size=1.
What GPU are you using, i.e. how much memory is available?
You might want to reduce the number of filters to reduce the memory footprint.
Also, torch.utils.checkpoint might be another approach to trade compute for memory.

My Laptop GPU is Gtx 1050 with 4GB DDR 4 Ram …
But I’ve tested it on google colab with 12 GB Ram and same thing happens …
I’m gonna give up On this exercise Because i couldn’t find any solution to this …
Anyway thanks for your help :heart:

I had this problem too when I tried to perform inference on a collection models, say model_1, … model_n.
I solved this by detaching output from the computation graph

output = model_i(input).detach()

I am not an expert in how GPU works. But I think GPU saves the gradients of the model’s parameters
after it performs inference. That can be a significant amount of memory if your model has a lot parameters. You can tell GPU not save the gradients by detaching the output from the graph.