Multiple but different gpus

Hey guys ,

is it in general possible to use the data.parallel wrapper if you got two different gpus

thanks Tobi

1 Like

Yes, that’s possible.
However, you will get a warning, if there is an imbalance in the GPU memory (one has less memory than the other).
Also, your performance should depend on the slowest GPU you are using, so it might not be recommended, if you are using GPUs with a very different performance profile.

3 Likes

thanks for the reply,

I got another question it seems like that the two gpus get the same distributions of the data , if the dataset is to large then i got a cuda out off memory error, but for small dataset this is no problem . Do you have any idea how to fix this?

best, Tobi

Have a look at this blog post to see how nn.DataParallel works internally and how to counter some effects of an imbalanced memory usage.

PS: you could also try out nn.DistributedDataParallel, which shouldn’t introduce the imbalance.

Apologies for the old thread resurrection here.

If I have a single machine with 2 mismatched GPUs (1 fast with big memory, 1 slow with small memory), DataParallel will only go as fast as the slow GPU with less memory, making it not worth using - as was covered above.

Does DistributedDataParallel on a single machine with 2 mismatched GPUs make sense? Or am I better off simply using the single fast GPU?

I’m trying to understand the usages of DistributedDataParallel (SPSD, SPMD, etc.)

I’ve been trying to read this and getting a bit confused: [POLL][RFC] Can we retire Single-Process Multi-Device Mode from DistributedDataParallel? · Issue #47012 · pytorch/pytorch · GitHub

Both parallel approaches would suffer from the slow GPU, so you would have to check the actual performance using the fast GPU only vs. DistributedDataParallel.

1 Like

I’ve got a similar issue, but in my instance it’s not to much the speed I’m worried about.

I’ve got a machine with a single 3090, and ideally would purchase another but they seem hard to come by since the release of the 40xx series.

If I buy a 4090 and put it in the same machine am I right in that I’ll be limited by the speed of the 3090, but I will be able to take advantage of the extra memory capacity from having two gpus?

This could be the case as the slowest part of your pipeline would create the bottleneck.
It’s unclear if this would be the 3090 (or e.g. the data loading) and depends on your actual use case.

Sorry I should have been clearer. What I meant was, can I assume that if I’m using a 3090 and 4090 in the same machine, I won’t get the speed of the 4090 because I’m using it in conjunction with a slower GPU, but I will be able to take advantage of having the additional GPU memory. So I can use larger models, batch sizes, etc.

Yes, as mentioned before you might not be able to get the full performance of the 4090 as it would have to wait for the slowest part of your entire training pipeline. It could be the 3090, it could also be any other part such as the data loading and even your 3090 could already be running into the bottleneck, so I would recommend to profile the workload to see how the overall training behaves.

1 Like