Parallelize four conv2d layers on GPU

I have an input tensor with size [1,3,4,100,100] which corresponds to [batchsize, channels, depth, width, height].
I want to use a 2d convolution for each depth so I need four 2d convolutions. After doing this,
I stack the results again into the depth dimension.
Code:

 x = torch.ones([1,3,4,100,100], dtype=torch.float32).cuda()
c_1 = nn.Conv2d( in_channels=3, out_channels=100, kernel_size=[3,3],padding=1).cuda() # all conv layers have the same parameters
c_2 = nn.Conv2d(3,100,3,padding=1).cuda()
c_3 = nn.Conv2d(3,100,3,padding=1).cuda()
c_4 = nn.Conv2d(3,100,3,padding=1).cuda()
#### Can i do this in parallel
pred_1 = c_1(x[:,:,0,:])
pred_2 = c_2(x[:,:,1,:])
pred_3 = c_3(x[:,:,2,:])
pred_4 = c_3(x[:,:,3,:])
###
pred = torch.stack([pred_1,pred_2,pred_3,pred_4], dim=2)
# pred has size [1,100,4,100,100]

This calculates the convolutions after each other.
But each convolution is completely independent of each other. They all take seperate inputs.
Is it possible to calculate all convolutions at the same time on the gpu?

It’s a bit hacky. You could parallelize Conv2d with groups parameter, given that all the convolution layers have same number of input, output channels, kernel_sizes etc.

A small example: