Now, we bought a 4 way rtx3090 24g GPU server, and I want to confirm there will be total 48g memory while I use nvlink to connect two 3090.
If rtx3090 supports this feature, how should I change my pytorch code?
Thanks.
Now, we bought a 4 way rtx3090 24g GPU server, and I want to confirm there will be total 48g memory while I use nvlink to connect two 3090.
If rtx3090 supports this feature, how should I change my pytorch code?
Thanks.
I want to confirm there will be total 48g memory while I use nvlink to connect two 3090.
That sounds right, but since this is a GPU hardware spec-related question, I would ask the NVIDIA team directly.
If rtx3090 supports this feature, how should I change my pytorch code?
With 2 GPUs and nvlink connecting them, I would use DistributedDataParallel (DDP) for training. At a high level, you can spawn 2 CPU processes, 1 for each GPU, and create a NCCL Process Group to have fast data transfer between the 2 GPUs. Then you can simply wrap your model with DDP and train.
Here are some references in terms of how to implement:
Nvidia has officially said that they will stop supporting SLI. the 3090 does still have the the connector.
I am not sure if the loss of driver profiles also means the loss of driver support going forwards and I do not know if this will affect using the cards for computation. Hopefully someone else can chime in as I am curious to know.
No, the devices should not show up as a single GPU with 48GB.
You can connect them via nvlink
and use a data or model parallel approach.
Actually, I cannot see there is a single 48GB GPU in my server but can get the effection equals to 48GB on training. Such as I can set a higher batch_size without any other change, right?
I’m not sure what “without any other change” means, but you would either have to use a data parallel approach (which needs code changes) or a model parallel (also knows as model sharding), which also needs code changes.
But yes, you can use multiple devices in PyTorch.
Hi, sorry for my questions but I have some doubts about data parallel. Can I use data parallel on two nvidia RTX 3060 gpus, which are not compatible with nvlink? Or to use data parallel do I necessarily have to have GPU with nvlink or SLI?
You do not need nvlink to use data parallel, nvlink just makes the interconnect bandwidth between them faster, it is not required.
So I can use data parallel with any GPUs, right?
Thanks you so much!!
Yes, you should be able to use nn.DataParallel
with any GPUs and with your 3060s also.
As @spacecraft1013 explained you won’t necessarily need nvlink
, but it would speed up the p2p communication.
Also note, that we generally recommend to use DistributedDataParallel
with a single process per device for the best performance.
Thanks you so much @ptrblck
Do you know how much faster it would be training on 2x 3090 with NVLink compared to 2x 3090 without NVLink? 10%? 30%? 50%?