Convert float32 to float16 with reduced GPU memory cost

Hi there,

I have a huge tensor (Gb level) on GPU and I want to convert it to float16 to save some GPU memory. How could I achieve this?

I tried

a_fp16 = a.to(torch.float16)

But it actually reserves another memory block to save the fp16 tensor and the fp32 tensor is still there.
I also tried

del a

after casting. But the memory is not released.

Thanks

That’s expected, as you are still holding references to both tensors.

This shouldn’t be the case, as a would be released and the memory would be added to the cache as seen here:

# create a tensor of 4MB
a = torch.randn(1024, 1024, device='cuda')
print(torch.cuda.memory_allocated() / 1024**2)
> 4.0
print(torch.cuda.memory_reserved()/ 1024**2)
> 20.0

# create a FP16 copy of 2MB
a_fp16 = a.to(torch.float16)
print(torch.cuda.memory_allocated() / 1024**2)
> 6.0
print(torch.cuda.memory_reserved()/ 1024**2)
> 20.0

# delete 4MB tensor
del a
print(torch.cuda.memory_allocated() / 1024**2) 
> 2.0
print(torch.cuda.memory_reserved()/ 1024**2)
> 20.0

For my case, is it because tensor a is in the computing graph? do I need to detach it to reduce memory?

I believe the issue is that the backward pass uses additional memory by computing the gradients.
Detaching the tensor would lower the memory usage, but would also break the training, so no you should not detach the output/loss, if you want to train the model.

I believe so. Unfortunately, del a still doesn’t work for me…

Could you post an executable code snippet, which would show this non-working behavior?