Hey everybody,
I am currently trying to figure out how much memory different models need for the forwardpass on the CPU (I know GPU is much faster ;)). I came across the PyTorch Profiler, but I have problems to interpret the results.
With
with profile(activities=activities, record_shapes=True, profile_memory=True) as prof:
with record_function("model_inference"):
model.eval()
prediction = model([inp])
print(prof.key_averages().table(top_level_events_only=True, row_limit=10))
I get the following output:
===============================================================================
This report only display top-level ops statistics
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
aten::zeros 0.00% 58.000us 0.31% 18.145ms 1.008ms 98039.65 Kb -19.59 Kb 18
model_inference 3.15% 182.338ms 100.00% 5.788s 5.788s 592648.17 Kb -1799765.56 Kb 1
Self CPU time total: 5.788s
I know that Self CPU refers only to the particular function without the child processes and CPU total includes them. With memory I think it will be analogous. However, CPU Mem states 592648.17 Kb, while including the child processes 1799765.56 Kb have been released. What does this mean exactly? Can I say that a total of 1799765.56 Kb was needed or how do I get the total memory needed from the whole model? Is this a reasonable way at all, or should I rather use psutil (e.g. psutil.Process().memory_info().vms)? However, here the copmlete Python process is specified including imported libraries as far as I know.
I would be happy if someone can help me. Thank you!