Why Final Fully connected layer is needed in a CNN?

While building an image classifier, we use a set of convolution layers and then put a final fully connected layer as the final layer.
If building a 10-class classifier, Can’t we directly use a final convolution layer with output matrix’s dimension as batch_size * w * h*n_channels and transpose it to get the results then use softmax activation function to get probabilities of each class?

It would be great if someone can share the reason of using the final FC layer

It will work exactly as a FC layer.
As a side note in Pytorch the outputs are in order (batch_size, channels, height, width) contrast to the Numpy way of (height, width, channels)

It makes no sense. Convolutions keep spatial structure. Flattening everything would lead to a worse solution as you should fine a kernel such that applied to a tensor and flattened provides good features. That’s harder than applying fully connected layers.

Sometimes ppl pass max pooling features or average pooling features as classification scores.

But aren’t we just trying to find that kernel while doing backprop?