Questions about GPU memory usage

I’m trying to profile a model’s memory usage right now using this tutorial: Understanding GPU Memory 1: Visualizing All Allocations over Time | PyTorch. Looking at the output, almost all of the memory usage is listed as Unknown (screenshot attached). When I step through the code watching nvidia-smi, it looks like the biggest increase in memory comes during the forward pass of the model. Does anyone have suggestions on how to debug this further? I can post my code, but my model/dataset are spread over several files. Aside from the model/dataset import, I exactly follow the code in Appendix B of the link above. Is there a more methodical way to find out which parts of my model are leading to the largest costs in memory?

Similarly, here’s the screenshot showing memory by tensor. But as I don’t see an obvious way to associate that to specific tensors in the model, I’m not sure how to use this to debug further (I’m using four random batches with different sequence lengths).

@rkd1137 there is no easy way to do this. Can share a few general pointers

There is a good article Understanding CUDA Memory Usage — PyTorch main documentation

  1. Check if there are certain allocations that are continuous across the iterations. These are probably hogging memory without being part of the training cycle

  2. If the stack shows *_backward they are part of backward propagation

  3. If the stack shows just a plain operation like convolution and increase upto a point they are part of the forward flow for each epoch