Why does several models in one Python kernel take so little memory?


(Ivan Markov) #1

Hi all!
I encountered strange GPU behavior and can’t figure why it happens.
I have created five NN’s architectures in a single python kernel. So there are 5 instances of classes net_1, …, net_5 which take same batches as input. One NN takes approximately 600 MB of GPU memory as it is in ‘nvidia-smi’ log. I thought that five would take ~3 GB but they take only ~1 GB… I can save model weights, load them for each model, everything seems to work and learning goes well. But I still can’t explain why additional NN for about 5 mil parameters takes so little space. My guesses were that some strange weight copying happens or… I dunno.
Every model has it separate optimizers obviously, just the flow of learning is the same.
Can anyone explain what’s going on and whether I made a mistake somewhere?
Thanks in advance.


#2

CUDA uses some memory for its context. This is usually done in the first CUDA call.
On my system this takes approx. 400 MB on the GPU.

Have a look at the memory usage before importing pytorch, then call torch.cuda.init() and see the growth.
This means, that the 1GB aren’t used completely by the 5 models.
Could you check this on your machine?


(Ivan Markov) #3

Thank you very much for your reply.
I rechecked everything and now it seems to me that everything works as it should. 5000000 parameters of Floats should take about 20 Mb and I guess it’s my fault that I can’t add numbers >_<. So in total about 600 Mb for PyTorch init + about 100 Mb for each net with additional data (batches etc.).
So PyTorch works perfectly :slight_smile: