How to calculate the activation memory usage of a model

I want to write a module to track and analyze the memory usage of model activations. Here’s my current statistics code. Can it provide reasonable estimates of GPU memory consumption by activations? I think the result will be right only if torch.cuda.memory_allocate will synchorize until forward finishes.


memory_before_forward = torch.cuda.memory_allocate()
# do forward
model(input)
memory_after_forward = torch.cuda.memory_allocate()
print("activation memory: ", (memory_after_forward - memory_before_forward) / 1024**3)

Your code assumes all activations are stored on the GPU, which is a common case for a lot of models, but which will fail if e.g. CPU offloading is used. Also, libraries would allocate workspaces (e.g. cuBLAS) so you should ideally execute warmup iterations.
A better way could be to actually save the number of elements and dtype of all forward activations during their computation using a tensor subclass.