I have an input tensor with size [1,3,4,100,100] which corresponds to [batchsize, channels, depth, width, height].

I want to use a 2d convolution for each depth so I need four 2d convolutions. After doing this,

I stack the results again into the depth dimension.

Code:

```
x = torch.ones([1,3,4,100,100], dtype=torch.float32).cuda()
c_1 = nn.Conv2d( in_channels=3, out_channels=100, kernel_size=[3,3],padding=1).cuda() # all conv layers have the same parameters
c_2 = nn.Conv2d(3,100,3,padding=1).cuda()
c_3 = nn.Conv2d(3,100,3,padding=1).cuda()
c_4 = nn.Conv2d(3,100,3,padding=1).cuda()
#### Can i do this in parallel
pred_1 = c_1(x[:,:,0,:])
pred_2 = c_2(x[:,:,1,:])
pred_3 = c_3(x[:,:,2,:])
pred_4 = c_3(x[:,:,3,:])
###
pred = torch.stack([pred_1,pred_2,pred_3,pred_4], dim=2)
# pred has size [1,100,4,100,100]
```

This calculates the convolutions after each other.

But each convolution is completely independent of each other. They all take seperate inputs.

Is it possible to calculate all convolutions at the same time on the gpu?