How to delete a Tensor in GPU to free up memory

ztf-ucas · June 25, 2019, 5:03am

How to delete a Tensor in GPU to free up memory？

I can get a Tensor in GPU by Tensor.cuda(), but it just returns a copy in GPU. I wonder how can I delete this Tensor in GPU? I try to delete it with “del Tnesor” but it doesn’t work.

smth · June 25, 2019, 5:20am

del Tensor will delete it from GPU memory. Why do you think it doesn’t work?

ztf-ucas · June 25, 2019, 7:11am

Thank you very much!
I loaded an OrderedDict of pre-trained weights to gpu by torch.load(), then used a for loop to delete its elements, but there was no change in gpu memory.
Besides, it is strange that there was no change in gpu memory even I deleted the OrderedDict of pre-trained weights.
Pytorch version is 0.4.0.

xuhdev · June 25, 2019, 8:08am

I guess what you have missed here is torch.cuda.empty_cache. After del Tensor, call torch.cuda.empty_cache() and see whether GPU memory usage changes.

ztf-ucas · June 25, 2019, 8:39am

Thank you very much！
There is no change in gpu memory after excuting torch.cuda.empty_cache().
I just want to manually delete some unused variables such as grads or other intermediate variables to free up gpu memory. So I tested it by loading the pre-trained weights to gpu, then try to delete it. I’ve tried del, torch.cuda.empty_cache(), but nothing was happening.

xuhdev · June 26, 2019, 3:27am

Could you show a minimum example? The following code works for me for PyTorch 1.1.0:

import torch
a = torch.zero(300000000, dtype=torch.int8, device='cuda')
b = torch.zero(300000000, dtype=torch.int8, device='cuda')
# Check GPU memory using nvidia-smi
del a
torch.cuda.empty_cache()
# Check GPU memory again

ztf-ucas · June 27, 2019, 9:30am

Thank you very much！I guess this problem is caused by Pytorch version. I will change the version to 1.1.0.

ztf-ucas · June 27, 2019, 1:44pm

Hi! I test the code you provided. I found no torch.zero (should be torch.zeros). But I found the gpu memory was not change even for Pytorch 1.1.0. This is very strange. By the way, I run this code in a docker container.

woolpeeker · June 26, 2020, 1:21pm

I found that when I del the tensor, the GPU still has a small amount of memory occupation.
The codes are below:

import torch
t = torch.zeros([1024, 1024, 1024, 2], device=‘cuda:0’)
del t
torch.cuda.empty_cache()

The GPU still has about 700M usage
|==================+=============+==================|
| 0 GeForce RTX 208… Off | 00000000:05:00.0 Off | N/A |
| 33% 49C P8 29W / 250W | 717MiB / 11016MiB | 0% Default |
±------------------------------±---------------------±--------------------------------+

ptrblck · June 27, 2020, 9:55am

These 700MB of device memory are used by the CUDA context for the CUDA kernels in PyTorch as well as other libs such as cudnn, NCCL etc. and cannot be freed.

Flock1 · October 18, 2020, 5:32am

Correct me if I’m wrong but I load an image and convert it to torch tensor and cuda(). So when I do that and run torch.cuda.memory_allocated(), it goes from 0 to some memory allocated. But then, I delete the image using del and then I run torch.cuda.reset_max_memory_allocated() and torch.cuda.empty_cache(), I see no change in torch.cuda.memory_allocated(). What should I do?

ptrblck · October 18, 2020, 10:20am

That’s the right approach, which also works for me:

path = '...'
image = Image.open(path)

print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 0

x = transforms.ToTensor()(image)
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 0

x = x.cuda()
print(torch.cuda.memory_allocated())
> 23068672
print(torch.cuda.memory_reserved())
> 23068672

del x
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 23068672

torch.cuda.empty_cache()
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 0

Flock1 · October 18, 2020, 4:27pm

I don’t know how but now it works. Pytorch is full of surprises.

One question though. Why does the memory still show up in nvidia-smi? Will this affect me overall memory usage?

ptrblck · October 18, 2020, 9:29pm

The first CUDA operation will create the CUDA context containing the native CUDA kernels, cudnn etc. and will use the memory until the application is closed.

Joseph_Summerhays · January 17, 2021, 7:42pm

I had the same problem with del x and torch.cuda.empty_cache() not removing everything off the GPU. Eventually I wrapped the for loop in with torch.no_grad() and now it works. I think if the gradient is turned on, it saves intermediate steps, even if you delete the final product. Therefore, turning off the gradient should solve (some people’s) problems.

Aniruddha_Kalburgi · January 20, 2021, 5:06pm

Yes, either of these will do the job:

with torch.no_grad():

Or

x.detach().cpu()

Kabir_Nagrecha · January 24, 2021, 9:51pm

Once the loop is done (say with no_grad on), is there a way to iterate through and delete these intermediate computations?

sekigh · March 31, 2021, 8:42am

Hi,

I still see the memory capacity insistently remains unchanged upon the following two different codes;

x.detach(), del x, torch.cuda.empty_cache()

I checked the attribute of x right after x.detach().cpu() and x still has

is_cuda True,
grad_fun SelectBackward,
requires_grad True

So the second code is with replacing x.detach().cpu() with x= x.detach().cpu() and see the x attributes right after x= x.detach().cpu() as follows;

is_cuda False,
grad_fun None,
requires_grad False

Even requires_grad and backward related attributes go away successfully, x still remains on gpu memory. Is there any insight to delet x out of gpu memory ?

I am hesitated to use with torch.no_grad(): because I need a few variables in a loop to have requires_grad True for backward operartion.

Thank you in advance.

origin_of_symmetry · August 2, 2021, 11:13pm

Same problems here. Try to convert a fp32 tensor to fp16 tensor with tensor.half(), and deleting the original fp32 tensor from memory. None of these codes work.

ptrblck · August 8, 2021, 7:10am

It should work as described and verified here. Could you post an executable code snippet, which shows that it’s not working as intended, please?