Hey guys ,
is it in general possible to use the data.parallel wrapper if you got two different gpus
thanks Tobi
Hey guys ,
is it in general possible to use the data.parallel wrapper if you got two different gpus
thanks Tobi
Yes, that’s possible.
However, you will get a warning, if there is an imbalance in the GPU memory (one has less memory than the other).
Also, your performance should depend on the slowest GPU you are using, so it might not be recommended, if you are using GPUs with a very different performance profile.
thanks for the reply,
I got another question it seems like that the two gpus get the same distributions of the data , if the dataset is to large then i got a cuda out off memory error, but for small dataset this is no problem . Do you have any idea how to fix this?
best, Tobi
Have a look at this blog post to see how nn.DataParallel
works internally and how to counter some effects of an imbalanced memory usage.
PS: you could also try out nn.DistributedDataParallel
, which shouldn’t introduce the imbalance.
Apologies for the old thread resurrection here.
If I have a single machine with 2 mismatched GPUs (1 fast with big memory, 1 slow with small memory), DataParallel will only go as fast as the slow GPU with less memory, making it not worth using - as was covered above.
Does DistributedDataParallel on a single machine with 2 mismatched GPUs make sense? Or am I better off simply using the single fast GPU?
I’m trying to understand the usages of DistributedDataParallel (SPSD, SPMD, etc.)
I’ve been trying to read this and getting a bit confused: [POLL][RFC] Can we retire Single-Process Multi-Device Mode from DistributedDataParallel? · Issue #47012 · pytorch/pytorch · GitHub
Both parallel approaches would suffer from the slow GPU, so you would have to check the actual performance using the fast GPU only vs. DistributedDataParallel
.