As title. It seems that nvidia-smi preserves more GPU memory if the model class is inherited from another nn.Module class.
I create a class to implement my algorithm that uses torch.utils.checkpoint, then I found that I need to share the implementation for a new model. So I extract it into a base class which is inherited from nn.Module. My original model class now has been simplified a lot, but I noticed that the GPU memory usage reported by nvidia-smi is much higher than before: 14450MiB to almost full 22868MiB on Nvidia GeForce RTX 3090.
What could be the reason(s)? Is it possible to make the value lower?
Measuring memory usage via nvidia-smi when you are using the CUDA caching allocator is going to be noisy, as you are really observing the total CUDA memory reserved (which means it is not necessarily in use by live tensors) and this reserved memory is also subject to fragmentation effects (which would be affected by your PYTORCH_CUDA_ALLOC_CONF setting).