Parallel training of a network with multiple branches

Hi, I am a beginner in Python and PyTorch. So please forgive me if it is a simple question.

I have a model with separate branches like

class mynet(nn.Module):
    def __init__(self, num):
        super(mynet, self).__init__()
        self.subnet = nn.ModuleList()
        for _ in range(num):
            self.subnet.append(mysubnet())
    def forward(self, x):
        y = []
        for i in range(len(self.subnet)):
            y.append(self.subnet[i](x[i]))
        return y

Since the subnets in the model are small, I wonder if it is possible to compute forward and loss.backward() in parallel?

Thanks in advance!

Without multiple hardware devices it might be difficult to realize a speedup if each subnet is contending for the same hardware resources. What are the layers used in each subnet? Would it be possible to combine them in some way (e.g., grouped convolutions for parallel convolutions)?