As an exercise for my students, I am creating various small pytorch examples that exhibit different issues, and then the students have to debug and solve those issues. I would like to include an example with memory leakage on the GPU.
However, I am having a hard time actually making a small example that exhibits memory leakage. This was my best attempt:
import torch
import torch.nn as nn
device=torch.device('cuda')
input_size=500
hidden_size=700
Xset=torch.utils.data.TensorDataset(torch.rand(1000000,input_size),torch.rand(1000000,1))
lossF=nn.functional.huber_loss
Bob_net=nn.Sequential(nn.Linear(input_size,hidden_size),
nn.ReLU(),
nn.Linear(hidden_size,hidden_size),
nn.ReLU(),
nn.Linear(hidden_size,hidden_size),
nn.ReLU(),
nn.Linear(hidden_size,hidden_size),
nn.ReLU(),
nn.Linear(hidden_size,hidden_size),
nn.ReLU(),
nn.Linear(hidden_size,1))
trainLoader=torch.utils.data.DataLoader(Xset,batch_size=64)
Bob_net.to(device)
optimizer=torch.optim.Adam(Bob_net.parameters())
losses=[]
for iEpoch in range(30):
print(f"Allocated memory: {torch.cuda.memory_allocated() / (1024 ** 2)} MB")
for xbatch,ybatch in trainLoader:
xbatch=xbatch.to(device)
ybatch=ybatch.to(device)
pred=Bob_net(xbatch)
loss=lossF(ybatch,pred)
Bob_net.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss)
Here, the allocated memory does grow, but it seems to be simply because the loss is not moved from the GPU, and the rising memory requirements are simply due to having a long list of tensors on the GPU. The intention was that since the loss is never detached from the graph, all graphs would be retained, leading to a significant increase in memory usage. However, the growth in memory allocation is the same as if I just do:
L=[]
for i in range(468750):
L.append(torch.rand(1).to(device))
(where 468750 happens to be the length of the âlossesâ list from the first example).
So, does anyone have an idea for how to make a small example of what can go wrong if you keep references to the graph? Or is it me who have misunderstood the point of âdetach()â?