Mixed Convolutions

Which dimension would you like to squeeze?
The kernels will use all channels (in the default setup with groups=1) in both cases.
However, their spatial size and stride differs as they will use:

  • the height and width in nn.Conv2d
  • the depth, height, and width in nn.Conv3d

If you set out_channels=1 for the last nn.Conv3d layer, you could squeeze the channel dimension.
The next nn.Conv2d layer will use the depth dimension as the new channel dimension.
Is that what you would like to achieve?