How to get round memory leak caused by accumulating loss in for loop?

JimW · March 30, 2023, 3:16am

Hi I have received a model from someone else， and I am trying to train it with our flow.

For this model, the most convenient way is to generate one label for a single sample. So I am doing something like this:

total_loss=0 
for i in range(len(dataset)):
     pred = model(dataset[i])
     gt = labels[i]
     if i%batch_size==0:
         total_loss.backward()
         optimizer.step()
         total_loss = 0
     else：
         total_loss += CrossEntropy(pred, gt)

I understand it is not the best practice to accumulate loss in this way. My observation is : when I used batch_size larger than 4, it will run for several epochs and then crash with CUDA Out of Memory error.

Is there a way in pytorch to get around this?
I found some solutions online which is to add “.item()” to total_loss, but I think it will make this node undifferentiable.

Joschka · March 30, 2023, 7:02am

I see two issues with your code.

You are never zeroing the gradients of your model in-between optimization steps. This is not necessarily related to your memory leak, but most likely still a problem.
I think if i % batch_size is equal to 0, the prediction made in that iteration and the correpsonding computation graph might never be cleared, because with your current control-flow, they are never subject to any backward pass. Computing and accumulating the loss in every iteration should fix this.

Hope this helps:

total_loss=0 
for i in range(len(dataset)):
     pred = model(dataset[i])
     gt = labels[i]
     total_loss += CrossEntropy(pred, gt)
     if i%batch_size==0:
         optimizer.zero_grad()
         total_loss.backward()
         optimizer.step()
         total_loss = 0