How to delete a Tensor in GPU to free up memory

How to delete a Tensor in GPU to free up memory?

I can get a Tensor in GPU by Tensor.cuda(), but it just returns a copy in GPU. I wonder how can I delete this Tensor in GPU? I try to delete it with “del Tnesor” but it doesn’t work.

1 Like

del Tensor will delete it from GPU memory. Why do you think it doesn’t work?

7 Likes

Thank you very much!
I loaded an OrderedDict of pre-trained weights to gpu by torch.load(), then used a for loop to delete its elements, but there was no change in gpu memory.
Besides, it is strange that there was no change in gpu memory even I deleted the OrderedDict of pre-trained weights.
Pytorch version is 0.4.0.

2 Likes

I guess what you have missed here is torch.cuda.empty_cache. After del Tensor, call torch.cuda.empty_cache() and see whether GPU memory usage changes.

5 Likes

Thank you very much!
There is no change in gpu memory after excuting torch.cuda.empty_cache().
I just want to manually delete some unused variables such as grads or other intermediate variables to free up gpu memory. So I tested it by loading the pre-trained weights to gpu, then try to delete it. I’ve tried del, torch.cuda.empty_cache(), but nothing was happening.

1 Like

Could you show a minimum example? The following code works for me for PyTorch 1.1.0:

import torch
a = torch.zero(300000000, dtype=torch.int8, device='cuda')
b = torch.zero(300000000, dtype=torch.int8, device='cuda')
# Check GPU memory using nvidia-smi
del a
torch.cuda.empty_cache()
# Check GPU memory again
6 Likes

Thank you very much!I guess this problem is caused by Pytorch version. I will change the version to 1.1.0.

Hi! I test the code you provided. I found no torch.zero (should be torch.zeros). But I found the gpu memory was not change even for Pytorch 1.1.0. This is very strange. By the way, I run this code in a docker container.

1 Like

I found that when I del the tensor, the GPU still has a small amount of memory occupation.
The codes are below:

import torch
t = torch.zeros([1024, 1024, 1024, 2], device=‘cuda:0’)
del t
torch.cuda.empty_cache()

The GPU still has about 700M usage
|==================+=============+==================|
| 0 GeForce RTX 208… Off | 00000000:05:00.0 Off | N/A |
| 33% 49C P8 29W / 250W | 717MiB / 11016MiB | 0% Default |
±------------------------------±---------------------±--------------------------------+

These 700MB of device memory are used by the CUDA context for the CUDA kernels in PyTorch as well as other libs such as cudnn, NCCL etc. and cannot be freed.

11 Likes

Correct me if I’m wrong but I load an image and convert it to torch tensor and cuda(). So when I do that and run torch.cuda.memory_allocated(), it goes from 0 to some memory allocated. But then, I delete the image using del and then I run torch.cuda.reset_max_memory_allocated() and torch.cuda.empty_cache(), I see no change in torch.cuda.memory_allocated(). What should I do?

1 Like

That’s the right approach, which also works for me:

path = '...'
image = Image.open(path)

print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 0

x = transforms.ToTensor()(image)
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 0

x = x.cuda()
print(torch.cuda.memory_allocated())
> 23068672
print(torch.cuda.memory_reserved())
> 23068672

del x
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 23068672

torch.cuda.empty_cache()
print(torch.cuda.memory_allocated())
> 0
print(torch.cuda.memory_reserved())
> 0
12 Likes

I don’t know how but now it works. Pytorch is full of surprises.

One question though. Why does the memory still show up in nvidia-smi? Will this affect me overall memory usage?

1 Like

The first CUDA operation will create the CUDA context containing the native CUDA kernels, cudnn etc. and will use the memory until the application is closed.

2 Likes

I had the same problem with del x and torch.cuda.empty_cache() not removing everything off the GPU. Eventually I wrapped the for loop in with torch.no_grad() and now it works. I think if the gradient is turned on, it saves intermediate steps, even if you delete the final product. Therefore, turning off the gradient should solve (some people’s) problems.

2 Likes

Yes, either of these will do the job:

with torch.no_grad():

Or

x.detach().cpu()

1 Like

Once the loop is done (say with no_grad on), is there a way to iterate through and delete these intermediate computations?

Hi,

I still see the memory capacity insistently remains unchanged upon the following two different codes;

x.detach(), del x, torch.cuda.empty_cache()

I checked the attribute of x right after x.detach().cpu() and x still has

is_cuda True,
grad_fun SelectBackward,
requires_grad True

So the second code is with replacing x.detach().cpu() with x= x.detach().cpu() and see the x attributes right after x= x.detach().cpu() as follows;

is_cuda False,
grad_fun None,
requires_grad False

Even requires_grad and backward related attributes go away successfully, x still remains on gpu memory. Is there any insight to delet x out of gpu memory ?

I am hesitated to use with torch.no_grad(): because I need a few variables in a loop to have requires_grad True for backward operartion.

Thank you in advance.

Same problems here. Try to convert a fp32 tensor to fp16 tensor with tensor.half(), and deleting the original fp32 tensor from memory. None of these codes work.

It should work as described and verified here. Could you post an executable code snippet, which shows that it’s not working as intended, please?