A6000 uses more memory than Titan

I just upgraded my HW to A6000 from Titan and noticed something unexpected. On Titan, smi would report GPU memory usage around 21G - 23G (can’t remember the exact number but Titan has 24G so had to be less than that). Now that same code is using 38G on A6000.

My dataloader does not know the HW I am using, so its parameters are fixed.

The only other thing I changed was to upgrade Dockerfile from pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime to pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime since A600 was not happy with the original.

Is this PyTorch just leaving stuff in memory since there is room or is there a default data structure that changed size in PyTorch?

You could check the memory usage via print(torch.cuda.memory_summary()) to check how much memory is allocated and how much is reserved, i.e. is in the cache and can be reused.
I would assume the cache is filling up and if you want to free this memory so that other applications could use it, you could call torch.cuda.empty_cache().

Thank you. I’m using pytorch-lightining so they handle all things CUDA.
I will however add print(torch.cuda.memory_summary()) as you suggested
so that I can track actual usage.