Hi, I am fairly new with using multiple GPUs so this might be a stupid question to ask but I couldn’t find an answer suitable for my particular situation given that the two GPUs are exactly the same and they have drastically different memory uptake throughout the whole training. As you can see, GPU 0 has more than twice the amount of memory being used than GPU 1. As of now observing the training, the difference is even larger with the first taking up about 12 gb. The volatile GPU-utility is also almost always significantly higher for GPU 0.
I hope someone can help me out here in understanding this situation and please let me know if you need more information to answer my question. Thanks!