GPU out of memory after a few steps, accumulation error?

I am training a Roberta masked language model for which I read my input as batches of sentences from a huge file. For batch sizes of 4 to 16 I run out of GPU memory after a few batches. Batch sizes over 16 run out of memory right away. Only batch_size of 2 works. I made sure not to accumulate anything. Is the error result of the size of the model or I am managing my memory badly?

def train():
    model.to(device)
    model.train()
    total_loss = 0
    total_preds = 0
    for step,batch in enumerate(train_dataloader):
        if step % 512 == 0 and step != 0:
            print(f'Batch {step}---loss: {maskedlmloss.item()}')
            writer.add_scalar('loss/training_loss', maskedlmloss.item(), step)
            writer.add_scalar('optimizer/lr', get_lr(optimizer), step)
            writer.flush()

        encoded_inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt')
        dic = collator(encoded_inputs['input_ids'].unbind())
        input_ids, labels = dic['input_ids'], dic['labels']
        
        model.zero_grad()
        maskedlmloss, prediction_scores = model(input_ids.to(device), labels = labels.to(device))

                
        total_loss += float(maskedlmloss.item())
        total_preds += 1

        maskedlmloss.backward()
        optimizer.step()
        scheduler.step()
    return total_loss/total_preds

Thanks for your help in advance

Are you using variable input shapes? If so, could you check if the largest batch would fit into the model?

All my batches are of the same size and I play with this size to find the largest that fits in the model. I am not sure if that’s what you are asking but the largest batch_size on which the model can carry out its computations without running out of memory is 2.

Thank you.

Are you seeing any memory increase in nvidia-smi for a batch size of 2?