Heterogeneous GPUs but same computation time


I’m using a RTX 2080 ti and a GTX 1050 ti in a two node cluster using pytorch. The problem cames when I execute it (distributed) but both of them take the same time solving MNIST. There are no sync points. Can anyone help me?

My cuda version is 10.0 in RTX and 9.2 in GTX. Im using pytorch 1.2 with mpi 3.1

Are you using DDP?
If so, the slower card might sync the faster one.

Or are you profiling the cards separately? If so, your code might have other bottlenecks (e.g. data loading). Have you profiled it?

We don’t guarantee compatibility between different versions of PyTorch. You say you have one version compiled against CUDA 10 and another against CUDA 9.2. This might work, but YMMV.

Yes, im using DPP, but im using asynchronous all_reduce to average gradients, so theres no synchronization if I dont make explicit a.wait() (which im not doing just to test), right? In that case, training times still the same which makes no sense for me. Am I losing anything?