Determine communication backend

Hey everyone,

I have a model spread across a couple of GPUs:

class MicroUNet3D(nn.Module):

    def __init__(self, n_channels, n_classes):
        super(MicroUNet3D, self).__init__()
        self.inconv = InConv(n_channels, 2).to('cuda:0')
        self.down1 = Down(2, 4).to('cuda:0')
        self.down2 = Down(4, 8).to('cuda:0')
        self.up1 = Up(8, 4).to('cuda:1')
        self.up2 = Up(4, 2).to('cuda:1')
        self.outconv = OutConv(2, n_classes).to('cuda:1')

    def forward(self, x):
        x1 = self.inconv(x)
        x2, indices1 = self.down1(x1)
        x3, indices2 = self.down2(x2)

        # Transfer to next GPU.
        x2, indices1 ='cuda:1'),'cuda:1')
        x3, indices2 ='cuda:1'),'cuda:1')

        x4 = self.up1(x3, indices2, x2.shape)
        x5 = self.up2(x4, indices1, x1.shape)
        x6 = self.outconv(x5)
        return x6

Is there a way to determine how the communication is being handled with the to() method? I am hoping that Pytorch will use NCCL here, and I would like to make sure.

No, PyTorch does not use NCCL for to() (copying from one GPU to another). That’s not one of the operations provided by NCCL (

PyTorch does use NCCL as a distributed backend and for DataParallel broadcast and reduction.

Ah yeah, for some reason I was thinking that NCCL could do send/recv. Does that mean that to() is coming down to main memory to then transfer over to the next GPU? Could it use NVLink?

Yes, it will use NVLink if available. The choice of how to communicate is made by the CUDA driver. PyTorch just calls cudaMemcpy (or launches a P2P kernel in some cases).

1 Like