Can profiler detect memory leak across epochs with self_cpu_memory_usage?

rinkujadhav2013 · January 12, 2023, 4:30pm

My code has some host memory leak which might be there due to some tensors attached in some custom autograd function while they shouldn’t be attached to the graph. Can I detect such leak by using PyTorch’s profiler? Considering the example listed in the tutorial here: PyTorch Profiler — PyTorch Tutorials 1.12.1+cu102 documentation

with profile(activities=[ProfilerActivity.CPU],
        profile_memory=True, record_shapes=True) as prof:
    print(prof.key_averages())

would the code above print higher memory usage values across epochs if there’s a leak?

ptrblck · January 12, 2023, 9:29pm

I don’t think checking the profiler output would help in this case, as it would show the memory usage of each operation, which is unrelated to storing tensors attached to a computation graph.
This code snippet should illustrate it:

model = models.resnet18()
x = torch.randn(1, 3, 224, 224)

outputs = []

for i in range(10):
    with profile(activities=[ProfilerActivity.CPU], profile_memory=True, record_shapes=True) as prof:
        out = model(x)
        outputs.append(out)
    
    print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))

Also note that you are not seeing a memory leak (which would indicate that memory is lost and cannot be freed anymore), but an expected increase of memory usage since you are explicitly storing tensors including their attached computation graph.

rinkujadhav2013 · January 12, 2023, 9:35pm

Thanks!
The follow up question is then, can I somehow get total tensor count or something similar so that I can check if it’s growing with each epoch or not?

pujaltes · April 24, 2024, 5:28pm

Were you able to solve this?

rinkujadhav2013 · June 30, 2024, 6:46pm

Nope. I had to work around it somehow. I think my code might still have memory leak (or not) but it’s so low that it doesn’t really affect anything throughout training so I’m okay.