Is it possible to manually partition a DNN and run in parallel for inference alone across few CPUs/GPUs?
For Example consider a dummy example. there are four convolutions that can be run in parallel without any dependencies and there are four machines. I would like to run each convolution on a independent GPU. Is there way to do model parallel inference across GPUs?
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d() self.conv2 = nn.Conv2d() self.conv3 = nn.Conv2d() self.conv4 = nn.Conv2d() def forward(self, x): y1 = self.conv1(x) y2 = self.conv2(x) y3 = self.conv3(x) y4 = self.conv4(x) return [y1,y2,y3,y4]