How to profile CUDA memory usage for each part of model

referece to pytorch profiler, it seem only trace cpu memory instead of gpu memory, is there any tool to trace cuda memory usage for each part of model?

Try GitHub - Stonesjtu/pytorch_memlab: Profiling and inspecting memory in pytorch, though it may be easier to just manually wrap some code blocks and measure usage deltas (of cuda.memory_allocated).

Thanks for your reply, I’ll try it.
Is there a official pytorch profiler for gpu memory?

afaik, it only has torch.profiler.profile(profile_memory=True) as an aggregator, I’m not sure if that produces useful results (there is undocumented autograd.profiler.record_function(“X”) to mark code blocks)…

Thanks! torch.profiler.profile(profile_memory=True) seem only produce cpu memory usage, I might have to find another way

there are options for cuda (version dependent, so check docs)