I am training a Roberta masked language model for which I read my input as batches of sentences from a huge file. For batch sizes of 4 to 16 I run out of GPU memory after a few batches. Batch sizes over 16 run out of memory right away. Only batch_size of 2 works. I made sure not to accumulate anything. Is the error result of the size of the model or I am managing my memory badly?
def train():
model.to(device)
model.train()
total_loss = 0
total_preds = 0
for step,batch in enumerate(train_dataloader):
if step % 512 == 0 and step != 0:
print(f'Batch {step}---loss: {maskedlmloss.item()}')
writer.add_scalar('loss/training_loss', maskedlmloss.item(), step)
writer.add_scalar('optimizer/lr', get_lr(optimizer), step)
writer.flush()
encoded_inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt')
dic = collator(encoded_inputs['input_ids'].unbind())
input_ids, labels = dic['input_ids'], dic['labels']
model.zero_grad()
maskedlmloss, prediction_scores = model(input_ids.to(device), labels = labels.to(device))
total_loss += float(maskedlmloss.item())
total_preds += 1
maskedlmloss.backward()
optimizer.step()
scheduler.step()
return total_loss/total_preds
Thanks for your help in advance