I’m want to create a network which outputs a tensor shaped like this:
batch x X x 6: where X is a number which is between 1 and 50
Sort of like a network which predicts the bounding boxes for a give shape in an image. But Im not sure which layer should be the last, because for example a nn.Linear() have a fixed output size.
Assume that a conv layer’s output is bchw, you can set the last channel is 6, and using global pooling by grouped to change output from hw to h1 or w1, then you do some dimensionality reduction. you will get the bX6©.
Thank you for your reply, but I need to accommodate for the pooling layer needs to have an output size of b x c x (1,1) but it could be b x c x (1,5) or b x c x (1,10)
Or you suggest I set the output shape dynamically in each forward pass?
No，i just suggest that you can get the shape what you want in the last layer.
But this is the issue, sometimes I need 4 * 6 detection for an image or sometime 8 * 6. Is this possible?