Determine communication backend

yngtodd · January 24, 2019, 7:55pm

Hey everyone,

I have a model spread across a couple of GPUs:

class MicroUNet3D(nn.Module):

    def __init__(self, n_channels, n_classes):
        super(MicroUNet3D, self).__init__()
        self.inconv = InConv(n_channels, 2).to('cuda:0')
        self.down1 = Down(2, 4).to('cuda:0')
        self.down2 = Down(4, 8).to('cuda:0')
        self.up1 = Up(8, 4).to('cuda:1')
        self.up2 = Up(4, 2).to('cuda:1')
        self.outconv = OutConv(2, n_classes).to('cuda:1')


    def forward(self, x):
        x1 = self.inconv(x)
        x2, indices1 = self.down1(x1)
        x3, indices2 = self.down2(x2)

        # Transfer to next GPU.
        x2, indices1 = x2.to('cuda:1'), indices1.to('cuda:1')
        x3, indices2 = x3.to('cuda:1'), indices2.to('cuda:1')

        x4 = self.up1(x3, indices2, x2.shape)
        x5 = self.up2(x4, indices1, x1.shape)
        x6 = self.outconv(x5)
        return x6

Is there a way to determine how the communication is being handled with the to() method? I am hoping that Pytorch will use NCCL here, and I would like to make sure.

colesbury · January 24, 2019, 9:53pm

No, PyTorch does not use NCCL for to() (copying from one GPU to another). That’s not one of the operations provided by NCCL (https://github.com/NVIDIA/nccl#whats-inside)

PyTorch does use NCCL as a distributed backend and for DataParallel broadcast and reduction.

yngtodd · January 25, 2019, 3:22pm

Ah yeah, for some reason I was thinking that NCCL could do send/recv. Does that mean that to() is coming down to main memory to then transfer over to the next GPU? Could it use NVLink?

colesbury · January 25, 2019, 7:19pm

Yes, it will use NVLink if available. The choice of how to communicate is made by the CUDA driver. PyTorch just calls cudaMemcpy (or launches a P2P kernel in some cases).