Is there will have total 48g memory if I use nvlink to connect two 3090?

prophet_zhan · October 23, 2020, 9:37am

Now, we bought a 4 way rtx3090 24g GPU server, and I want to confirm there will be total 48g memory while I use nvlink to connect two 3090.

If rtx3090 supports this feature, how should I change my pytorch code?

Thanks.

osalpekar · October 23, 2020, 5:49pm

I want to confirm there will be total 48g memory while I use nvlink to connect two 3090.

That sounds right, but since this is a GPU hardware spec-related question, I would ask the NVIDIA team directly.

If rtx3090 supports this feature, how should I change my pytorch code?

With 2 GPUs and nvlink connecting them, I would use DistributedDataParallel (DDP) for training. At a high level, you can spawn 2 CPU processes, 1 for each GPU, and create a NCCL Process Group to have fast data transfer between the 2 GPUs. Then you can simply wrap your model with DDP and train.

Here are some references in terms of how to implement:

docs: DistributedDataParallel — PyTorch master documentation
Implementation: Distributed Data Parallel — PyTorch master documentation
Example: https://github.com/pytorch/examples/tree/master/distributed/ddp

nicofish · October 23, 2020, 6:02pm

Nvidia has officially said that they will stop supporting SLI. the 3090 does still have the the connector.

I am not sure if the loss of driver profiles also means the loss of driver support going forwards and I do not know if this will affect using the cards for computation. Hopefully someone else can chime in as I am curious to know.

ptrblck · October 23, 2020, 11:58pm

No, the devices should not show up as a single GPU with 48GB.
You can connect them via nvlink and use a data or model parallel approach.

prophet_zhan · October 24, 2020, 8:45am

Actually, I cannot see there is a single 48GB GPU in my server but can get the effection equals to 48GB on training. Such as I can set a higher batch_size without any other change, right?

ptrblck · October 24, 2020, 8:47am

I’m not sure what “without any other change” means, but you would either have to use a data parallel approach (which needs code changes) or a model parallel (also knows as model sharding), which also needs code changes.
But yes, you can use multiple devices in PyTorch.

gyuko · April 18, 2021, 9:30pm

Hi, sorry for my questions but I have some doubts about data parallel. Can I use data parallel on two nvidia RTX 3060 gpus, which are not compatible with nvlink? Or to use data parallel do I necessarily have to have GPU with nvlink or SLI?

spacecraft1013 · April 18, 2021, 10:04pm

You do not need nvlink to use data parallel, nvlink just makes the interconnect bandwidth between them faster, it is not required.

gyuko · April 18, 2021, 10:13pm

So I can use data parallel with any GPUs, right?
Thanks you so much!!

ptrblck · April 19, 2021, 5:41am

Yes, you should be able to use nn.DataParallel with any GPUs and with your 3060s also.
As @spacecraft1013 explained you won’t necessarily need nvlink, but it would speed up the p2p communication.

Also note, that we generally recommend to use DistributedDataParallel with a single process per device for the best performance.

gyuko · April 19, 2021, 9:57am

Thanks you so much @ptrblck

thefreeman · January 29, 2024, 1:56am

Do you know how much faster it would be training on 2x 3090 with NVLink compared to 2x 3090 without NVLink? 10%? 30%? 50%?