Non-invasive GPU memory profiling


Does anyone have any recommendations on how to profile GPU memory in a non-invasive fashion?

Some options seem to be nvidia-smi with memory monitoring (sampling based, so it seems to miss peak usage among other shortcomings), nvprof with memory trace (seems too slow), nvprof with api trace (doesn’t report allocation amounts and doesn’t account for fragmentation) or a python-level source-invasive solution using gc, previously suggested in a separate thread here:

One simple solution seems to instrument the code around every cudaMalloc allocation in pytorch source and check available memory but it seems pytorch currently doesn’t expose a callback for tracing these calls.

You can get some information from these functions that give you the current state of the allocated memory and the caching allocator and the max usage: max_memory_allocated, max_memory_cached, memory_allocated and memory_cached.
Or did you wanted more precise profiling?

1 Like

Thank you! I think this should work.