Why Final Fully connected layer is needed in a CNN?

rpalsaxena · June 21, 2019, 8:07am

While building an image classifier, we use a set of convolution layers and then put a final fully connected layer as the final layer.
If building a 10-class classifier, Can’t we directly use a final convolution layer with output matrix’s dimension as batch_size * w * h*n_channels and transpose it to get the results then use softmax activation function to get probabilities of each class?

It would be great if someone can share the reason of using the final FC layer

Krish · June 21, 2019, 9:28am

It will work exactly as a FC layer.
As a side note in Pytorch the outputs are in order (batch_size, channels, height, width) contrast to the Numpy way of (height, width, channels)

JuanFMontesinos · June 21, 2019, 2:11pm

It makes no sense. Convolutions keep spatial structure. Flattening everything would lead to a worse solution as you should fine a kernel such that applied to a tensor and flattened provides good features. That’s harder than applying fully connected layers.

Sometimes ppl pass max pooling features or average pooling features as classification scores.

Krish · June 23, 2019, 10:57am

But aren’t we just trying to find that kernel while doing backprop?