Parallelize the application of multiple CNNs to multiple images


I’m testing out a rather unusual method that requires the application of n different CNNs to n different images. It is a one-to-one mapping where cnn_1 is applied to img_1, cnn_2 is applied to img_2 and so on. Currently I loop through each of the n networks and images to perform the forward pass, which gets rather slow for large n. I was just wondering if there is a way to parallelize this operation so that all n forwards passes can happen simultaneously?

I hope you have enough GPUs & memory to scale your system well as n increases !

Otherwise, you can use the torch.distributed.launch module. Take a look at this snippet, it might give you a better idea on how you can easily parallelize and even distribute compute using pytorch.

Thanks @LeviViana! If my understanding of torch.distributed.launch is correct, it can be used to distribute compute over multiple GPUs? However, I’d like to parallelize this operation on a single GPU. Ideally, it would be akin to torch.bmm, with the only difference being that instead of a matrix multiplication being batched the CNN forward pass is batched.

Here is a snippet explaining how you can achieve this using grouped convolutions:

import torch

image_1 = torch.rand(1, 3, 50, 50)
image_2 = torch.rand(1, 3, 50, 50)

conv_weight_1 = torch.rand([4, 3, 3, 3]) # in_channels=3, out_channels=4, kernel_size=(3, 3)
conv_weight_2 = torch.rand([4, 3, 3, 3]) # in_channels=3, out_channels=4, kernel_size=(3, 3)

conv_bias_1 = torch.rand([4]) # in_channels=3, out_channels=4, kernel_size=(3, 3)
conv_bias_2 = torch.rand([4]) # in_channels=3, out_channels=4, kernel_size=(3, 3)

# 1st case -> not fused convolutions:

res_1 = torch.nn.functional.conv2d(image_1, conv_weight_1, conv_bias_1)
res_2 = torch.nn.functional.conv2d(image_2, conv_weight_2, conv_bias_2)

res_not_fused =, res_2), dim=0)

# 2nd case -> fused convolutions:

conv_weight =, conv_weight_2), dim=0)
conv_bias =, conv_bias_2))

batch =,image_2), dim=1)

res_fused = torch.nn.functional.conv2d(batch, conv_weight, conv_bias, groups=2)
res_fused = res_fused.view(2, 4, 48, 48)

res_fused.allclose(res_not_fused) # <- they are actually equal

I needed to use torch.allclose instead of torch.equal because there is a very small numerical difference between the two methods.

Apologies, I just saw this! Thank you so much for the snippet! :slightly_smiling_face: