Input output of Conv2d into a Conv3d layer to get a 5 dimensional output?

I am working on a project based on the OpenPose research paper that I read two weeks ago. In that, the model is supposed to give a 5-dimensional output. For example, torch.nn.conv2d() gives a 4-D output of the following shape: (Batch_size, n_channels, input_width, input_height) . What I need is an output of the following shape: (Batch_size, n_channels, input_width, input_height, 2) . Here 2 is a fixed number not subject to any changes. The 2 is there because each entry is a 2-dimensional vector hence for each channel in every pixel position, there are 2 values hence, the added dimension.

What will be the best way to do this? I thought about having 2 seperate branches for each of the vector values but the network is very deep and I would like to be as Computationally efficient as possible.

It’s not a matter of using 3d convolutions to get 1 more output. Those channels will have a meaning. Depending on their meaning you will need to make use of different architectures.