GPU out of memory after a few steps, accumulation error?

I am training a Roberta masked language model for which I read my input as batches of sentences from a huge file. For batch sizes of 4 to 16 I run out of GPU memory after a few batches. Batch sizes over 16 run out of memory right away. Only batch_size of 2 works. I made sure not to accumulate anything. Is the error result of the size of the model or I am managing my memory badly?

def train():
    total_loss = 0
    total_preds = 0
    for step,batch in enumerate(train_dataloader):
        if step % 512 == 0 and step != 0:
            print(f'Batch {step}---loss: {maskedlmloss.item()}')
            writer.add_scalar('loss/training_loss', maskedlmloss.item(), step)
            writer.add_scalar('optimizer/lr', get_lr(optimizer), step)

        encoded_inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt')
        dic = collator(encoded_inputs['input_ids'].unbind())
        input_ids, labels = dic['input_ids'], dic['labels']
        maskedlmloss, prediction_scores = model(, labels =

        total_loss += float(maskedlmloss.item())
        total_preds += 1

    return total_loss/total_preds

Are you using variable input shapes? If so, could you check if the largest batch would fit into the model?

All my batches are of the same size and I play with this size to find the largest that fits in the model. I am not sure if that’s what you are asking but the largest batch_size on which the model can carry out its computations without running out of memory is 2.

Are you seeing any memory increase in nvidia-smi for a batch size of 2?