Hi,
I found an elusive memory leak in my network. Unfortunately I can’t create a minimal working example, but I’ll try describe the behavior I’m observing below. First, pseudocode of the parts of the model that I think are relevant (let’s say this is in the forward pass of a model plus a line of loss computation at the end):
logits_a = layer_a(inputs) // logits_a is size [B, N]
logits_b = layer_b(inputs) // logits_b is size [B, 1]
combined_logits = torch.cat((logits_a, logits_b), dim=1)
combined_probs = torch.softmax(combined_logits, dim=1)
probs_a = combined_probs[:, :-1] // size [B, N]
probs_b = combined_probs[:, -1] // size [B]
... // later on in the code...
// self._weights are parameters of size [1, 1]
h_a = torch.matmul(probs_a.view(B * N, 1), self._weight_a)
h_b = torch.matmul(probs_b.view(B, 1), self._weight_b) // THIS LINE
... // later on in the code...
loss = - label * torch.log(probs_b)
What I’m observing after multiple batches of forward passes is that if I comment out the line labeled THIS LINE
, memory allocation (via torch.cuda.memory_allocated()
) is stable, but if I leave that line (even if I don’t use h_b
anywhere further in the graph), I see the memory allocation increasing after each batch when the tensors in the graph should have been garbage collected. Note that I am using probs_b
later on in the graph, and it doesn’t cause a memory leak if I leave the last line in the code block (I think the main difference between that line and the line that causes the error is that probs_b
is not mixed with a parameter in the graph? Could be wrong though.).
Any ideas on what could possibly be going on here, or advice on debugging this? It seems like even when I am not even using the result of this matmul, it’s causing a memory leak. It doesn’t seem to be a problem with probs_a
, either.
Thanks!