Distributed evaluation/inference with different performance per GPU


I’m implementing distributed evaluation using DistributedDataParallel and so far, with the help of this forum, it works quite well :slight_smile:

However, I noticed that when running it on two GPUs (Titan V) the second gpu is quite a bit slower than the first one.

The first GPU does around 4 it/s whereas the second GPU only does 3 it/s . I was wondering if there is a problem with my implementation, so here is the the code:

Do you see any obvious flaws?

Both GPUs are using roughly the same amount of memory:

I also noticed that the GPU 1 most of the time uses less power then GPU 0.

Any explanation for performance loss on the second GPU?


I’m using pytorch 2.1.0.dev20230719 because of the problems written here.

I don’t know what kind of setup you are using, but did you profile the devices in other applications (i.e. not in PyTorch)? E.g. is the PCIe bandwidth the same for all devices or is GPU1 using less lanes etc.?

1 Like

Ok, I found the problem… its simply heat. Opening the computer makes both GPUs running at same speed. Sorry for bothering you with that.

Not at all and thanks for sharing the root cause as it can certainly help others!

1 Like