class ToyMpModel(nn.Module):
def __init__(self, dev0, dev1):
super(ToyMpModel, self).__init__()
self.dev0 = dev0
self.dev1 = dev1
self.net1 = torch.nn.Linear(10, 10).to(dev0)
self.relu = torch.nn.ReLU()
self.net2 = torch.nn.Linear(10, 5).to(dev1)
def forward(self, x):
x = x.to(self.dev0)
x = self.relu(self.net1(x))
x = x.to(self.dev1)
return self.net2(x)
How can I split a pretrained model (DeeplabV3Resnet101) onto different GPUs?
def getDeepLabV3Resnet101Pretrained(num_of_classes):
model = models.segmentation.deeplabv3_resnet101(pretrained=1)
# Change number of output classes
model.classifier[4] = nn.Conv2d(
in_channels=256,
out_channels=num_of_classes,
kernel_size=1,
stride=1
)
# And now how to put different model parts on different GPUs?
# Does model.children() help?
How would you determine where to split?
I would try to calculate the number of parameters for every model layer and then make more or less equal splits.
I think you will need to manually place different layers on different GPUs. After that you will need to configure your forward function (similar to the ToyMpModel example you referenced), where you must send the the input batch to the first GPU, get the activations after passing through all of the layers on the first GPU, then send those activations to the next GPU, and so on until the last layer on the last GPU.
We currently don’t provide an automated way of splitting the model optimally across machines, but the approach you mentioned should work. In essence, I would compute the number of parameters in the model and try to create equal splits so that each GPU gets roughly similar number of parameters.
Do I also need to change this or does this “.to” work with nn.sequential (no separate forward function) as well?
“.to” would work on nn.sequential, although you need to modify the forward function since once you have completed execution for the module on GPU0, the output will be on GPU0. Now since the other module you want to execute is on GPU1, you need to move the output from GPU0 to GPU1 manually (using “.to”) and then you need to execute the module on GPU1.