Memory leak when appending tensors to a list

Here is the snippet:

    for ii,line in enumerate(lines):

        if line['evidence'][-1][-1][-1]  != None:
            #feats, labels = tf_idf_claim(line)
            feats, ev_sent, _ = fasttext_claim(line,ft_model,db,params)
            labels = ind2indicator(ev_sent,feats.shape[-1])
            all_labels.append(labels.numpy())

            pred_labels, scores = model(feats.unsqueeze_(0).transpose_(1,2).cuda(device_id))
            all_scores[ii] = scores

I see that the gpu errors out OOM after a while. This is running on Tesla K80.
If I change the last line to all_scores[ii] = scores.detach().cpu().numpy(), it gets fixed.
This is only process running on the gpu core.

I’m not sure how scores was calculated, but it could still hold a reference to the computation graph.
If that’s the case, you are storing the whole computation graph in your list in each iteration, which eventually fills up your GPU memory.
Detaching the tensor should fix this issue as you’ve already mentioned.

scores come from softmax outputs. The scores are Tensors of shape [1,10,100]. Does it mean any tensor calculated by the model holds a reference to the whole graph?
I am new to pytorch, so my question might be dumb. Thanks for your help.

In this case the softmax output needs the computation graph to be able to calculate the gradients in the backward pass.
You can check it by printing the grad_fn:

print(scores.grad_fn)
# Should return something like
<SoftmaxBackward at 0x7f371721a668>

If you store something from your model (for debugging purpose) and don’t need to calculate gradients with it anymore, I would recommend to call detach on it as it won’t have any effects if the tensor is already detached.

That makes sense to me now. Thanks for the detailed answer.