Why does GPU memory usage not double when loading two identical models in PyTorch?

ximen0806449 · June 9, 2025, 8:34am

Hi everyone,

I have a question regarding GPU memory usage when loading identical models in PyTorch.

I am working on a project where I need to load the same model twice onto the GPU. However, I noticed that when I load the second model, the GPU memory usage only increases slightly, instead of doubling as I expected.

Why does PyTorch share memory between identical models? Is this an optimization feature or a limitation of how PyTorch handles GPU memory?

I would greatly appreciate it if someone could explain this behavior in detail or point me to relevant resources.

ximen0806449 · June 9, 2025, 11:14am

Thanks for your reply!

I attempted to use threading to load two identical models, where one handles model_1 and the other handles model_2. Interestingly, I noticed that the GPU memory usage doubled. Could you please explain what might be causing this phenomenon?

ptrblck · June 9, 2025, 12:58pm

The previous response was a wrong claim about parameter sharing created by a bot.
How do you measure the memory usage? Are you checking the allocated memory via torch.cuda.memory_allocated() or including the cached memory?