Different memory uptake in different GPUs of same type


Hi, I am fairly new with using multiple GPUs so this might be a stupid question to ask but I couldn’t find an answer suitable for my particular situation given that the two GPUs are exactly the same and they have drastically different memory uptake throughout the whole training. As you can see, GPU 0 has more than twice the amount of memory being used than GPU 1. As of now observing the training, the difference is even larger with the first taking up about 12 gb. The volatile GPU-utility is also almost always significantly higher for GPU 0.

I hope someone can help me out here in understanding this situation and please let me know if you need more information to answer my question. Thanks!

I found this link, but is there a better solution that is part of the stable pytorch release now that more than a year has passed?


What this post says is that to have even usage, you should put everything in the DataParallel block.
In your case, if you have some part of your net that is only on one GPU (which is gpu0 by default) then it is expected that gpu0 is more used no?