Cuda runtime error (2): out of memory

ZeweiChu · February 27, 2017, 4:03pm

I wrote some LSTM based code for language modeling:

 def forward(self, input, hidden):
         emb = self.encoder(input)
         h, c = hidden
         h.data.squeeze_(0)
         c.data.squeeze_(0)

         seq_len = input.size(0)
         batch_size = input.size(1)
         output_dim = h.size(1)

         output = [] 
         for i in range(seq_len):        
             h, c = self.rnncell(emb[i], (h, c))
             # self.hiddens: time * batch * nhid
             if i == 0:
                 self.hiddens = h.unsqueeze(0)
             else:
                 self.hiddens = torch.cat([self.hiddens, h.unsqueeze(0)])
             # h: batch * nhid
             #self.att = h.unsqueeze(0).expand_as(self.hiddens)

             self.hiddens = self.hiddens.view(-1, self.nhid)
             b = torch.mm(self.hiddens, self.U).view(-1, batch_size, 1)
             a = torch.mm(h, self.W).unsqueeze(0).expand_as(b)
             att = torch.tanh(a + b).view(-1, batch_size)
             att = self.softmax(att.t()).t()
             self.hiddens = self.hiddens.view(-1, batch_size, self.nhid)
             att = att.unsqueeze(2).expand_as(self.hiddens)
             output.append(torch.sum(att * self.hiddens, 0)) #hidden.data

         output = torch.cat(output)

         decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
         decoded = self.logsoftmax(decoded)
         output = decoded.view(output.size(0), output.size(1), decoded.size(1)) 
         return output, (h, c)

And I got error in backward():

RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487343590888/work/torch/lib/THC/generic/THCStorage.cu:66

Any ideas why it might happen?

The memory goes to 5800MB very quickly in the first 10 batches, and then it keeps running with this much memory occupied for another several hundred batches, and then it runs out of memory.

apaszke · February 27, 2017, 6:24pm

Is there any reason why you’re keeping self.hiddens?

ZeweiChu · February 27, 2017, 6:25pm

No, I don’t have to keep it. Is it a bad thing to keep unnecessary variables in the model?

smth · February 27, 2017, 6:30pm

if you keep Variables around, the corresponding graph that created these Variables is kept around. Hence the elevated memory usage…

apaszke · February 27, 2017, 6:37pm

@ZeweiChu yes, it’s good practice to make your model stateless. It’s best if you only keep references to parameters, and all intermediate values generated in forward are not saved anywhere for extended periods of time.

ZeweiChu · February 27, 2017, 6:52pm

The main part of my code looks like this.

 def repackage_variable(v, volatile=False):
     return [Variable(torch.from_numpy(h), volatile=volatile).unsqueeze(1) for h in v]

 for k in range(len(minbatches)):
         
         minbatch = minbatches[perm[k]]
         x_padded = utils.make_mask(minbatch)
         x_padded = repackage_variable(x_padded, False)
         x_padded = torch.cat(x_padded, 1)
         T = x_padded.size(0)
         B = x_padded.size(1)
         inp = x_padded[:T-1, :].long()
         target = x_padded[1:, :].long().view(-1, 1)
         if use_cuda:
             inp = inp.cuda()
             target = target.cuda()       

  
         mask = (inp != 0).float().view(-1, 1)
         
         hidden = model.init_hidden(batch)
         model.zero_grad()
         #print(inp.size())
         output, hidden = model(inp, hidden)
         output = output.view(-1, n_vocab)
         
         loss = output.gather(1, target) * mask
         loss = -torch.sum(loss) / torch.sum(mask)
         loss.backward()
         

         optimizer.step()

My question is, at each iteration, since all "Variable"s “inp” and “target” are overwritten, will the model state variables like “self.hiddens” also be overwritten? Does the old computation graph still exist in the next iteration?

nvidia-smi shows that about 6G of memory is used, but I am only testing on batch size of 50, and the length should be at most 200, why would it take up so much memory? And the memory size increases among iterations from time to time, but it could stay the same for a while. Any clues what might be the reason?

ruotianluo · February 27, 2017, 8:26pm

Won’t self.hiddens be cleaned after backward?

apaszke · February 27, 2017, 8:43pm

@ruotianluo how would they get cleaned up? It’s a reference. We’ll free most of the buffers, but I think there might still be some of them alive. This is going to change in the upcoming releases btw.

@ZeweiChu I can’t see anything wrong with your example. The only suggestion would be to convert the input into Variables as late as you can (e.g. do cat, type casts and copies on tensors not Variables). Maybe that’s how much memory your model requires. Are you sure it can even fit in memory?

ruotianluo · February 27, 2017, 8:55pm

So, the reference should be cleaned up after self.hiddens is overwritten by next forward? Is it correct?

apaszke · February 28, 2017, 1:02am

Yes. It won’t be kept there indefinitely, but it still can postpone some frees and increase the overall memory usage.

biswajitsc · June 2, 2017, 7:31am

Any progress on this one? I am facing a similar issue. I have implemented an LSTM and the memory remains constant for about 9000 iterations after which it runs out of memory. I am not keeping any references of the intermediate Variables.

I am running this on a 12GB Titan X GPU on a shared server.

biswajitsc · June 4, 2017, 3:56pm

Finally fixed it. There was problem in my code. I was unaware that x = y[a:b] is not a deep copy of y. I was modifying x, and in turn modifying y, and increasing the size of the data in every iteration. Using x = copy.deepcopy(y[a:b]) fixed it for me.

RangoHU · June 10, 2017, 7:36am

So did you figure out why your memory usage keeps increasing? I had the exact same question as you did. Thanks.

hmishfaq · August 18, 2017, 11:39pm

How can I manually free the memory? For example, how would you clean up self.hidden here?

John_Zhang · August 29, 2017, 5:26am

hi
why moidifying y will increase size of the data? i have similar problems

hmishfaq · August 29, 2017, 9:10pm

I was suggested to do del self.hidden before return output