Memory leakage during realtime inference of pytorch_pretrained_bert's BertForSequenceClassification model


I’m facing memory leakage in the realtime inference of pytorch_pretrained_bert’s BertForSequenceClassification model.

Although I’m using GPU but still CPU memory is exhausting

      with torch.no_grad():
         logits = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
         logits = logits.detach().cpu().numpy()
         del logits

model, b_input_ids and b_input_mask are mapped on gpu using .to(device)

Can anyone suggest me what can I do… memory is increasing gradually

I see in the code you are appending the logits to predictions which will increase the cpu memory.

i have detach the graphs further predictions array is being empty after every few iterations