When I run a basic linear layer, I encounter a strange memory leak. See the code below:
from pynvml import nvmlInit, nvmlDeviceGetHandleByIndex, nvmlDeviceGetMemoryInfo
import torch
import gc
def get_free_space(idx=0):
nvmlInit()
h = nvmlDeviceGetHandleByIndex(idx)
info = nvmlDeviceGetMemoryInfo(h)
return info.free
linear_layer = torch.nn.Linear(768, 768).to("cuda:0")
with torch.no_grad():
print(get_free_space(0))
a_detach = torch.zeros((128, 129, 768)).to("cuda:0")
d = linear_layer(a_detach)
del d
del a_detach
gc.collect()
torch.cuda.empty_cache()
print(get_free_space(0))
Output:
7680950272
7639007232
The code run within the no_grad loop should ONLY be creating two tensors a_detach and d, both of which are promptly deleted. So why was the additional 50MB of memory lost?
Note that this leak continues to occur even after deleting the layer itself! (del linear_layer after the other dels).