GPU out of memory after a few steps, accumulation error?

Majid_Al · October 6, 2020, 8:02pm

I am training a Roberta masked language model for which I read my input as batches of sentences from a huge file. For batch sizes of 4 to 16 I run out of GPU memory after a few batches. Batch sizes over 16 run out of memory right away. Only batch_size of 2 works. I made sure not to accumulate anything. Is the error result of the size of the model or I am managing my memory badly?

def train():
    model.to(device)
    model.train()
    total_loss = 0
    total_preds = 0
    for step,batch in enumerate(train_dataloader):
        if step % 512 == 0 and step != 0:
            print(f'Batch {step}---loss: {maskedlmloss.item()}')
            writer.add_scalar('loss/training_loss', maskedlmloss.item(), step)
            writer.add_scalar('optimizer/lr', get_lr(optimizer), step)
            writer.flush()

        encoded_inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt')
        dic = collator(encoded_inputs['input_ids'].unbind())
        input_ids, labels = dic['input_ids'], dic['labels']
        
        model.zero_grad()
        maskedlmloss, prediction_scores = model(input_ids.to(device), labels = labels.to(device))

                
        total_loss += float(maskedlmloss.item())
        total_preds += 1

        maskedlmloss.backward()
        optimizer.step()
        scheduler.step()
    return total_loss/total_preds

Thanks for your help in advance

ptrblck · October 9, 2020, 12:02am

Are you using variable input shapes? If so, could you check if the largest batch would fit into the model?

Majid_Al · October 12, 2020, 3:35pm

All my batches are of the same size and I play with this size to find the largest that fits in the model. I am not sure if that’s what you are asking but the largest batch_size on which the model can carry out its computations without running out of memory is 2.

Thank you.

ptrblck · October 13, 2020, 6:14am

Are you seeing any memory increase in nvidia-smi for a batch size of 2?