How can we release GPU memory cache?

debvrat · April 29, 2020, 4:59pm

Hi,

Thank you for your response.
Yes, I understand clearing out cache after restarting is not sensible as memory should ideally be deallocated.

But, if my model was able to train with a certain batch size for the past ‘n’ attempts, why does it stop doing so on my 'n+1’th attempt? I do not see how reducing the batch size would become a solution to this problem.

As I said, this happens very randomly.
Although my program is obviously able to detect GPU (as it says ‘“CUDA” out of memory’), I still wanted to check it programmatically. So, I inserted print(torch.cuda.is_available()) right before training began. And voila! It worked. No more CUDA out of memory error. Atleast for the time being. There is no logic behind why it started working now.

albanD · April 29, 2020, 5:29pm

Hi,

What can cause this as well is other programs using the same GPU. If you’re sharing the machine. Or have a screen connected to it. Or even other pytorch script that use the GPU by mistake.

debvrat · April 29, 2020, 5:51pm

Hi,

I am accessing the machine through a Remote Desktop connection.

And apart from the main program, there are no Python scripts running in the background. Certainly there are softwares such as VSCode and multiple Google Chrome tabs which are open.

But this was the case even when the main program was running smoothly (i.e. without OOM error).

Thanks,

tcexeexe · August 31, 2020, 4:05pm

Hello, I have the same problem. I run torch.cuda.empty_cache() after last turn group of image finished traning then l start to traning a new group without restart the kernel but the gpu memory used is still getting bigger and bigger.
I described the problems in this topic.I wonder if you have some good suggestions,thanks!
https://discuss.pytorch.org/c/memory-format/23

zshn25 · September 8, 2020, 10:51am

On Linux, sometimes you have an old process utilizing GPU. You can check these processes using nvidia-smi in the Terminal. Note the PID of any processes utilizing GPU and kill them using sudo kill <enter PID here>

Alexander_Soare · September 17, 2020, 3:16pm

I was about to ask a question but I found my issue. Maybe it will help others.

I was on Google Colab and finding that I could train my model several times, but that on the 3rd or 4th time I’d run into the memory error. Using torch.cuda.empty_cache() between runs did not help. All I could do was restart my kernel.

I had a setup of the sort:

class Fitter:
    def __init__(self, model):
        self.model = model
        optimizer = # init optimizer here

The point is that I was carrying the model over in between runs but making a new optimizer (in my case I was making new instances of Fitter). And in my case, the (Adam) optimizer state actually took up more memory than my model!

So to fix it I tried some things.
This did not work:

def wipe_memory(self): # DOES NOT WORK
    self.optimizer = None
    torch.cuda.empty_cache()

Neither did this:

def wipe_memory(self): # DOES NOT WORK
    del self.optimizer
    self.optimizer = None
    gc.collect()
    torch.cuda.empty_cache()

This did work!

def wipe_memory(self): # DOES WORK
    self._optimizer_to(torch.device('cpu'))
    del self.optimizer
    gc.collect()
    torch.cuda.empty_cache()

def _optimizer_to(self, device):
    for param in self.optimizer.state.values():
        # Not sure there are any global tensors in the state dict
        if isinstance(param, torch.Tensor):
            param.data = param.data.to(device)
            if param._grad is not None:
                param._grad.data = param._grad.data.to(device)
        elif isinstance(param, dict):
            for subparam in param.values():
                if isinstance(subparam, torch.Tensor):
                    subparam.data = subparam.data.to(device)
                    if subparam._grad is not None:
                        subparam._grad.data = subparam._grad.data.to(device)

I got that optimizer_to function from here

Rajjaa · October 17, 2020, 8:13pm

Thank you Alexander the gc.collect() worked for me after deleting the model:

del model
gc.collect()
torch.cuda.empty_cache()

And I could check that memory was released using nvidia-smi

anu · May 12, 2021, 12:40pm

is this mandatory in the training session? For instance I’ve built such function to do my training.

def train(...):
    # Model: train
    model.train()

    # Load the data and convert to device
    for (data, label) in loader:
        ...

        # Refresh the gradients
        optimizer.zero_grad(set_to_none=True)
        # Calculate loss
        loss = model.objective(x)
        # Backprop
        loss.backward()
        # Optimizer step
        optimizer.step()

Should I keep it as it is or am I supposed to have item of loss to be able to free some space in my gpu?

albanD · May 12, 2021, 2:42pm

No this function looks good.
You should use .item() if you want to store the value of your loss in a list for further plotting/tracking (basically anything that would make it out-live the inner loop). Otherwise, you don’t need to worry about this.

anu · May 12, 2021, 3:17pm

one last thing I wonder is that would it cause any problem or contribute in freeing up cached memory if I do something like this;

# Training arrangements
...
# Backprop
loss.backward()
# Optimizer step
optimizer.step()

# Then, delete loss object
del loss
# and free cache
torch.cuda.empty_cache()

thanks in advance!

albanD · May 12, 2021, 3:30pm

This will slow down your training (empty_cache is an expensive call). But otherwise, in 99.9% of the cases won’t do anything else.

Emptying the cache is already done if you’re about to run out of memory so there is no reason for you to do it by hand unless you have multiple processes using the same GPU and you want this process to free up space for the other process to use it. Which is a very very un-usual thing to do.

Phu_Do · May 24, 2022, 10:35am

i met some problem with release GPU memory when inference with Text to Speech model.
Example:

create model usage 735MB
inference usage 844 MB
→ at this step, it took 735+844 = 1579MB
empty_cache → memory down to 735MB

But after several time do above steps i met 2 issue:

Some time, Inference phase increase more 30-40MB
the increase memory can NOT release by empty_cache
=> It make my process explode the gpu memory

Can anyone give me some advice about on this 2 issue???

Phu_Do · May 24, 2022, 10:36am

I met some problem with release GPU memory when inference with Text to Speech model.
Example:

create model usage 735MB
inference usage 844 MB
→ at this step, it took 735+844 = 1579MB
empty_cache → memory down to 735MB

But after several time do above steps i met 2 issue:

Some time, Inference phase increase more 30-40MB
the increase memory can NOT release by empty_cache
=> It make my process explode the gpu memory

Can anyone give me some advice about on this 2 issue???

ptrblck · May 24, 2022, 11:51pm

Based on your description it seems you are storing (unwanted) data and are thus increasing the memory usage until you are eventually running into an OOM error.
Freeing the cache will not avoid these error and besides slowing down your code will allow other processes to use the GPU memory.

Phu_Do · May 25, 2022, 2:28am

I found that issue was by memory leak of nn.linear and nn.conv1d. I don’t know how it gets increased in memory when passing that layer

Brando_Miranda · March 10, 2023, 8:45pm

@albanD can you clarify what this means? if loss.item() does not dellocate the gpu memory then does it keep the related graph in GPU memory and then return a float?

Is pytorch keeping track of activations in the graph in the model or only if the loss variable is in scope or referenced somewhere e.g. an object?

albanD · March 18, 2023, 5:02am

Yeah it is a bit overloaded. If you do value = loss.item(), it does not change anything to the loss function at that line. So it that sense will not “deallocate” the loss.
But it will still behave differently than tensor = loss.clone() for example. Clone will keep building the autograd graph and so tensor will potentially keep quite a bit of stuff alive due to that. This does not happen with .item() as it is returning a plain number and thus does not build the autograd graph.

FalsoMoralista · June 20, 2024, 2:43pm

I was having some issues with gpu memory as well and thought that it was something else but then figured out that I forgot to use “with torch.no_grad()” before trying to make inference with my models.

renatomello · August 21, 2025, 9:38am

You prevented me from wasting even more precious hours of my life. TYSM.