Is it possible to manually partition a DNN and run in parallel for inference alone across few CPUs/GPUs?
For Example consider a dummy example. there are four convolutions that can be run in parallel without any dependencies and there are four machines. I would like to run each convolution on a independent GPU. Is there way to do model parallel inference across GPUs?
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d()
self.conv2 = nn.Conv2d()
self.conv3 = nn.Conv2d()
self.conv4 = nn.Conv2d()
def forward(self, x):
y1 = self.conv1(x)
y2 = self.conv2(x)
y3 = self.conv3(x)
y4 = self.conv4(x)
return [y1,y2,y3,y4]