How can we release GPU memory cache?

I would like to do a hyper-parameter search so I trained and evaluated with all of the combinations of parameters.
But watching nvidia-smi memory-usage, I found that GPU-memory usage value slightly increased each after a hyper-parameter trial and after several times of trials, finally I got out of memory error. I think it is due to cuda memory caching in no longer use Tensor. I know torch.cuda.empty_cache but it needs do del valuable beforehand. In my case, I couldn’t locate memory consuming variable.
What is the best way to release the GPU memory cache?



torch.cuda.empty_cache() (EDITED: fixed function name) will release all the GPU memory cache that can be freed.
If after calling it, you still have some memory that is used, that means that you have a python variable (either torch Tensor or torch Variable) that reference it, and so it cannot be safely released as you can still access it.

You should make sure that you are not holding onto some objects in your code that just grow bigger and bigger with each loop in your search.


Thanks, @albanD !
So, the variables no longer referenced will be freed by using torch.cuda.empty_cache() right?

And I started to locate the memory consuming objects, but I couldn’t locate what variables gave me a bad effect.
As for variables related to cuda, I use same variable name (e.g. model, criterion) in different trials.

Anyway, I suspected below evaluation loop (sorry, it’s version 0.4, maybe version causes the problem?),

for i, (X, y) in tqdm(enumerate(val_loader), total=len(val_loader)):
    X = Variable(X.cuda())
    y = Variable(y.squeeze().cuda(non_blocking=False))

    with torch.no_grad():
        outputs = model(X)
        loss = criterion(outputs, y)

    prec1, prec5 = accuracy(, target, top_k=(1, 5))
    losses.update([0], X.size(0))
    top1.update(prec1[0], X.size(0))
    top5.update(prec5[0], X.size(0))

So any variable that is no longer reference is freed in the sense that its memory can be used to create new tensors, but this memory is not released to the os (so will still look like it’s used using nvidia-smi).
empty_cache forces the allocator that pytorch uses to release to the os any memory that it kept to allocate new tensors, so it will make a visible change while looking at nvidia-smi, but in reality, this memory was already available to allocate new tensors.

Your code look good, I would double check that things that you send to your logger are not Variables but just python numbers using .item() as necessary.

Also, if you’re using 0.4 (I assume current master), then you should remove Variable and .data from your code and replace[0] by loss.item() :wink:


Thank you for detailed answering! I tried other things such as del model, loss, ... but no one helps…
Actually, I am using a little bit older version from the current master branch. But I have thought I should remove Variable and refactor code someday, so your instruction is so helpful, Thanks!

AttributeError: module ‘torch.cuda’ has no attribute ‘empty’

Yes that was a typo I fixed it thanks.

1 Like

The variables prec1[0] and prec5[0] still hold reference to tensors. These should be replaced with prec1[0].item() and prec5[0].item() respectively. This is due to the fact that the accuracy method part of the pytorch imagenet training example code returns a tensor and can cause memory leak.

@albanD Could you please clarify the difference between detach() and item() when called on a tensor. Are these effectively the same?

1 Like

detach() return a tensor that shares storage (and the same device) with origin tensor, while item() return a converted python object.


I also face memory problem and using this command works.

Why torch doesn’t empty cache automatically though? Does emptying cache have much overhead?

I have a question that is it safety to use torch.cuda.empty_cache() before each iteration during training?

This should be safe, but might negatively affect the performance, since PyTorch might need to reallocate this memory again.
What is your use case that you would like to call it for each iteration?

What is the negative effect? My code runs more and more slowly and sometimes it may interrupt due to memory error, so I have thought to use it before each iteration. I don’t know does it have an effect on the speed?

If you see increasing memory usage, you might accidentally store some tensors with the an attached computation graph. E.g. if you store the loss for printing or debugging purposes, you should save loss.item() instead.

This issue won’t be solved, if you clear the cache repeatedly. As I said this might just trigger unnecessary allocations which will take some time thus potentially slowing down your code.


I see.
Thanks very much.

Just to clarify, item() deallocates loss? Its not clear to me what exactly item is doing.

.item() converts a Tensor containing a single element into a python number.
It does not “deallocate” loss. But it won’t keep it alive.

1 Like

This works only some of the times.

Even when I clear out all the variables, restart the kernel, and execute torch.cuda.empty_cache() as the first line in my code, I still get a ‘CUDA out of memory’ error.

I have the same issue as well. Has anyone found why does this happen?

1 Like


Running empty_cache at the beginning of your process is not useful as nothing is allocated yet.
When you restart the kernel, you force all memory to be deallocated.
So if you still run out of memory it is simply because your program requires more than what you have. You will most likely have to reduce the batch size or the size of your model.