I am working on a project based on the OpenPose research paper that I read two weeks ago. In that, the model is supposed to give a 5-dimensional output. For example, torch.nn.conv2d()
gives a 4-D output of the following shape: (Batch_size, n_channels, input_width, input_height)
. What I need is an output of the following shape: (Batch_size, n_channels, input_width, input_height, 2)
. Here 2
is a fixed number not subject to any changes. The 2
is there because each entry is a 2-dimensional vector hence for each channel in every pixel position, there are 2 values hence, the added dimension.
What will be the best way to do this? I thought about having 2 seperate branches for each of the vector values but the network is very deep and I would like to be as Computationally efficient as possible.