0
I’m facing memory leakage in the realtime inference of pytorch_pretrained_bert’s BertForSequenceClassification model.
Although I’m using GPU but still CPU memory is exhausting
with torch.no_grad():
logits = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
logits = logits.detach().cpu().numpy()
predictions.append(logits)
gc.collect()
torch.cuda.empty_cache()
del logits
model, b_input_ids and b_input_mask are mapped on gpu using .to(device)
Can anyone suggest me what can I do… memory is increasing gradually