Memory Leak When Replacing torch.nn.NLLLoss() with torch.nn.CrossEntropyLoss()

RylanSchaeffer · June 28, 2020, 10:25pm

I’m trying to refactor my code and now I’ve discovered that if I replace torch.nn.NLLLoss() with torch.nn.CrossEntropyLoss(), my code crashes from a memory error. Negative log likelihood by itself is fine. I’ve been debugging for hours and I have no ideas. Does anyone have any suggestions for what’s causing this?

I know there’s a memory leak from using the following code. I watch the number of non-garbage collected tensors climb:

    count = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
                print(type(obj), obj.size())
                count += 1

ptrblck · June 29, 2020, 9:11am

nn.CrossEntropyLoss would call into F.cross_entropy in this line of code, which would then use nll_loss(log_softmax()) in this line of code, so the calls should be equivalent.

Could you post a code snippet to reproduce this issue?

RylanSchaeffer · June 29, 2020, 1:36pm

I found the issue. I was storing the loss from the function call in a tensor and this was never getting freed, for some reason. More interestingly, memory was consumed more quickly for cross entropy, perhaps because it consists of two operations instead of one, hence why NLL appeared to have no memory leak when it did.

What are the rules by which a tensor is garbage collected? Why does overwriting a tensor not free it?

ptrblck · June 30, 2020, 1:56am

Could you post a small code snippet which would reproduce the memory leak?
Tensors should be freed once all references to them are deleted.
If an object still points to the tensor, it cannot be freed, but I’m unsure what your exact use case is.