DistributedDataParallel with different 2 GPUs

Yes, your overall training might run into a bottleneck caused by the slower GPU. Also, you are right that a custom sampler might be needed to create the imbalanced split for both devices.
This post describes potential issues a bit more and you could also profile the actual workload to see how severe the slowdown is.

1 Like