Hi, I have a memory leakage problem with the code below:
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.c, self.k, self.p, self.x = 256, 3, 1, 2304
self.n, self.h, self.w = 0, 0, 0
self.unfold = nn.Unfold(self.k, padding=self.p)
self.conv = nn.Conv2d(self.c * self.k * self.k, self.x, 1, padding=0, groups=self.c * self.k * self.k)
def forward(self, x: torch.Tensor):
self.n, self.h, self.w = x.size(0), x.size(2), x.size(3)
c1 = x.view(self.n, self.c, 1, self.h, self.w)
c2 = self.unfold(x)
c2 = c2.view(self.n, self.c, self.k * self.k, self.h, self.w)
out = c1 + c2
out = out.view(self.n, self.c * self.k * self.k, self.h, self.w)
return self.conv(out)
if __name__ == '__main__':
try:
net = Net().cuda()
x = torch.randn((1024, 256, 32, 32)).cuda()
out = net(x)
print(out)
except Exception as ex:
print(f' > CUDA memory: {torch.cuda.memory_allocated() / 1024 ** 3} GiB')
del x, net
print(ex)
print(f' > CUDA memory: {torch.cuda.memory_allocated() / 1024 ** 3} GiB')
Here I define a network with depth-wise convolution as the last layer. I’m using 24GB RTX 3090, so when it runs into self.conv(out)
, it will surely be out of CUDA memory due to the large tensors. Then we can catch the exception and delete the references of x
and net
, the expected CUDA memory after del x, net
should be zero but I got 9 GB instead, which means that there’s a memory leakage with a failed-to-allocate-output depth-wise convolution. If I change the group number of self.conv
into something else, the CUDA memory will be correctly zero.