Parallel training of a network with multiple branches

Hi, I am a beginner in Python and PyTorch. So please forgive me if it is a simple question.

I have a model with separate branches like

class mynet(nn.Module):
    def __init__(self, num):
        super(mynet, self).__init__()
        self.subnet = nn.ModuleList()
        for _ in range(num):
    def forward(self, x):
        y = []
        for i in range(len(self.subnet)):
        return y

Since the subnets in the model are small, I wonder if it is possible to compute forward and loss.backward() in parallel?

Thanks in advance!

Without multiple hardware devices it might be difficult to realize a speedup if each subnet is contending for the same hardware resources. What are the layers used in each subnet? Would it be possible to combine them in some way (e.g., grouped convolutions for parallel convolutions)?

Please forgive me for hijacking this thread, but I do have the same question and would very much like some more detail and especially syntax.

In big picture I’m looking to define something like the net image in this post, but with the arrows reversed. In other words, I want the input to be a set of identical-size tensors which each process through one or more layers of their own subnets (learning local representations) before feeding into one larger layer for further processing. The target hardware is a GPU, and as the original poster noted it seems like looping over the subnets will be inefficient.

In my specific case I have a set of voxels of interest (so not a cube), and within each voxel a specific x,y,z coordinate (real values) and a 10-bit one-hot encoding for the point in the voxel (also reals since tensors can have only one dtype). Thus for this example an input size of (N, 13) where N is fixed and on the order of 50-100 depending on the model. Ideally I want each of the 13-element vectors to process in their own fully-connected multi-layer subnets before feeding into a single larger layer.

The use of convolutional layers is intriguing, but I am not clear on the required syntax to define non-overlapping subnets without necessitating smaller output layers for the subnets as I see in most examples for convolutional layers.

Interested to read constructive suggestions and comments on any of this.