Measuring peak memory usage: tracemalloc for pytorch?

@SimonW, I have been thinking about the solution you implemented and there is a need for scoped-memory measurements, where scopes can overlap or be nested.

Scenario 1: an application relying on the normal functioning (pre-reset implementation) max_memory_allocated or max_memory_cached could now malfunction if some other application resets either or both (action a a distance).

Scenario 2: two profilers measuring different scopes. Say one measuring at a function level, another at a wider or narrower scope. Since there is only one counter they will be resetting each other’s measurements. python’s tracemalloc has the same issue, since it doesn’t create an object instance for the counter.

The 2nd scenario is not hypothetical, it’s actually a need I have right now as I have different profilers measuring different scopes. There are in different applications so they can’t really communicate with each other to keep each other in sync with reset calls. e.g. I have one profiler running on the train loop epoch-level, another on the jupyter cell-level, yet another on larger parts of the notebook. And unfortunately, my current measuring thread approach is clearly failing to catch all peaks :frowning: so it’d be extremely helpful to be able to switch to use max_memory_allocated yet different instances of it in different scopes.

So I need to be able to do something like:

max_obj1 = MaxMemoryAllocated()
# run some code 1
for epoch in epochs:
    max_obj2 = MaxMemoryAllocated()
    # run epoch code
    peak_epoch = max_obj2.peak()
# run some code ...
peak = max_obj1.peak()
del max_obj1

Of course, those would be unrelated applications, this code sample is just demonstrating how their execution will overlap and why the current implementation is insufficient.