Gpu memory filling after each batch

I wrote this function to imitate time-delay neural network:

def SGD(batch,weight,bias):

Layers=[0,0,0,0,0,0,0,0]
userCount = np.asarray(batch["lable"]).shape[1]
Layers[0] = torch.nn.Conv1d(24,512,5).cuda()
Layers[1] = torch.nn.Conv1d(512,512,3,dilation=2).cuda()
Layers[2] = torch.nn.Conv1d(512,512,3,dilation=3).cuda()
Layers[3] = torch.nn.Conv1d(512,512,1).cuda()
Layers[4] = torch.nn.Conv1d(512,1500,1).cuda()
Layers[5] = torch.nn.Linear(3000,512).cuda()
Layers[6] = torch.nn.Linear(512,512).cuda()
Layers[7] = torch.nn.Linear(512,userCount).cuda()

lable = Variable(torch.FloatTensor(batch["lable"]).cuda(),requires_grad=False)
layer0input=Variable(torch.FloatTensor(batch["data"]).cuda(),requires_grad=False)
layer1input=torch.sigmoid(Layers[0](layer0input))
layer2input=torch.sigmoid(Layers[1](layer1input))
layer3input=torch.sigmoid(Layers[2](layer2input))
layer4input=torch.sigmoid(Layers[3](layer3input))
layer4out=torch.sigmoid(Layers[4](layer4input))
mean = torch.mean(layer4out,dim=2)
std = torch.std(layer4out,dim=2)
layer5input=torch.cat([mean,std],dim=1)
layer6input=torch.sigmoid(Layers[5](layer5input))
layer7input=torch.sigmoid(Layers[6](layer6input))
softmax_input=torch.sigmoid(Layers[7](layer7input))
softmax_out = torch.nn.functional.softmax(softmax_input,dim=1)
loss = - torch.trace(torch.mm(torch.log(softmax_out),torch.t(lable)))
print(loss)	
loss.backward()

And the problem is that it fill gpu memory on each call, so i have “cuda runtime error (2) : out of memory” after few batches

I tried to “del loss del Layers” but it dont help

I’m using pytorch 0.3.1 and python 3.6.4

Currently it seems you are creating the layers in each call, so I doubt the model will learn anything at all.

Try to create a model using the nn.Module class:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # your layer definitions
       self.conv1 = nn.Conv2d(...)
       self.conv2 = nn.Conv2d(...)

    def forward(self, x):
        # your forward pass
        x = self.conv1(x)
        x = self....
        return x

Could you try that and have a look, if you still run out of memory?

Thank you, i will try this and report the results

I tried this:

class MyModel(torch.nn.Module):
def init(self):
super(MyModel, self).init()
self.conv0 = torch.nn.Conv1d(24,512,5).cuda()
self.conv1 = torch.nn.Conv1d(512,512,3,dilation=2).cuda()
self.conv2 = torch.nn.Conv1d(512,512,3,dilation=3).cuda()
self.conv3 = torch.nn.Conv1d(512,512,1).cuda()
self.conv4 = torch.nn.Conv1d(512,1500,1).cuda()
self.lin1 = torch.nn.Linear(3000,512).cuda()
self.lin2 = torch.nn.Linear(512,512).cuda()
self.lin3 = torch.nn.Linear(512,2000).cuda()

def forward(self,x):
    x = torch.sigmoid(self.conv0(x))
    x = torch.sigmoid(self.conv1(x))
    x = torch.sigmoid(self.conv2(x))
    x = torch.sigmoid(self.conv3(x))
    x = torch.sigmoid(self.conv4(x))
    mean = torch.mean(x,dim=2)
    std = torch.std(x,dim=2)
    x = torch.cat([mean,std],dim=1)
    x = torch.sigmoid(self.lin1(x)) 
    x = torch.sigmoid(self.lin2(x))
    x = torch.sigmoid(self.lin3(x))
    x = torch.nn.functional.softmax(x,dim=1)
    return x

def backward(self,x,lable):
    loss = - torch.trace(torch.mm(torch.log(x),torch.t(lable)))
    loss.backward()
    return self.conv0.weight.grad.data

When im calling just forward it works without problems, but when im trying this:

for i in range(10):
x = Net.forward(Variable(torch.FloatTensor(np.random.rand(128,24,500)).cuda(),requires_grad=False) )
print(Net.backward(x,Variable(torch.FloatTensor(np.random.rand(128,2000)).cuda(),requires_grad=False)))
print(get_gpu_memory_map())

it starts to fill gpu like it was before

Your code runs fine on my machine. Which PyTorch version are you using?

Also some minor points:

  • usually it’s better to remove the .cuda() calls on your class members and just call model.cuda() after creation
  • instead of model.forward(x) you should just call model(x)

I’m using version 0.3.1

I just tested it with 0.3.1.post2 (on CPU only) and my memory was growing up to 38GB.
Could you try to build PyTorch from source? Maybe it’s a leak, which was recently fixed.
You can find the build instructions here.
It should be quite easy to compile it, but let me know, if you encounter any issues.

I’ve just find out some sort of solution, i’ll call subprocess for batch with “multiprocessing” save new weights to manager and reinitialize weights before next call. So with end of subprocess occupied memory will get free.