Model Parallelism Across Multiple Devices

Is it possible to manually partition a DNN and run in parallel for inference alone across few CPUs/GPUs?

For Example consider a dummy example. there are four convolutions that can be run in parallel without any dependencies and there are four machines. I would like to run each convolution on a independent GPU. Is there way to do model parallel inference across GPUs?

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d()
        self.conv2 = nn.Conv2d()
        self.conv3 = nn.Conv2d()
        self.conv4 = nn.Conv2d()

    def forward(self, x):
        y1 = self.conv1(x)
        y2 = self.conv2(x)
        y3 = self.conv3(x)
        y4 = self.conv4(x)
        return [y1,y2,y3,y4]

@ptrblck Any solution?

Yes, you could push each module to the corresponding device and execute each forward pass using an input on the corresponding device.

1 Like