Hello,
I have been working on profiling LLMs from HuggingFace and I have always assumed that I could trust the torch.profiler and the self_device_memory_usage metric.
In order to double check the result, I started using psutils and for a small model it turns out that psutils tells me I have used 100 MBs.
torch.profiler, instead, tells me 21 MBs when I sum the self_device_memory_usage of each event.
What am I missing? Shouldn’t the two results more or less be the same?
Thank you in advance for the help!