Why does several models in one Python kernel take so little memory?

Hi all!
I encountered strange GPU behavior and can’t figure why it happens.
I have created five NN’s architectures in a single python kernel. So there are 5 instances of classes net_1, …, net_5 which take same batches as input. One NN takes approximately 600 MB of GPU memory as it is in ‘nvidia-smi’ log. I thought that five would take ~3 GB but they take only ~1 GB… I can save model weights, load them for each model, everything seems to work and learning goes well. But I still can’t explain why additional NN for about 5 mil parameters takes so little space. My guesses were that some strange weight copying happens or… I dunno.
Every model has it separate optimizers obviously, just the flow of learning is the same.
Can anyone explain what’s going on and whether I made a mistake somewhere?
Thanks in advance.

CUDA uses some memory for its context. This is usually done in the first CUDA call.
On my system this takes approx. 400 MB on the GPU.

Have a look at the memory usage before importing pytorch, then call torch.cuda.init() and see the growth.
This means, that the 1GB aren’t used completely by the 5 models.
Could you check this on your machine?

1 Like

Thank you very much for your reply.
I rechecked everything and now it seems to me that everything works as it should. 5000000 parameters of Floats should take about 20 Mb and I guess it’s my fault that I can’t add numbers >_<. So in total about 600 Mb for PyTorch init + about 100 Mb for each net with additional data (batches etc.).
So PyTorch works perfectly :slight_smile: