Same here. I tried adding calls to torch.cuda.empty_cache() and gc.collect, did not help.
import torch
from torch import nn
import gc
x = torch.rand(1,1,144,144,144).to('cuda:0')
mp = torch.nn.MaxPool3d(2,2, return_indices=True)
mup = torch.nn.MaxUnpool3d(2,2)
while True:
import pdb; pdb.set_trace()
y, i = mp(x)
a = mup(y, i)
torch.cuda.empty_cache()
gc.collect()
nvidia-smi output shows GPU usage increasing by about 5-10 Mib per iteration of the while loop.