Global average pooling misunderstanding

Hello,

l h would like to replace my fully connected layer with global average pooling layer.
l have 10 classes and my last convolutional layer output a 3D tensor of 16,25,32.


Last_conv=tensor.view(16,25,32)
# something to do here !!
final_layer=self.global_average_pooling(last_conv) #
output=self.Softmax(final_layer)

My question is how to pass from 3d tensor of 16,25,32 to 10 (number of classes) through global average pooling ?

Thank you

Hello,

Global average pooling takes your 3d tensor of shape (16,25,32) into a tensor of shape (1,1,32), assuming 32 corresponds to the channel dimension. To produce an output ot size 10, you have to reshape this to a 1D vector of size 32 and apply a linear layer.

By a linear layer you mean a fully connected layer ?

Yes, fully connected

In the original paper , global pooling is supposed to get rid of fully connected layer. I don’t want to use fully connected layer . Link : https://arxiv.org/pdf/1312.4400.pdf page 4 section 3.2 paragraph 3

"
In this paper, we propose another strategy called global average pooling to replace the traditional
fully connected layers in CNN. The idea is to generate one feature map for each corresponding
category of the classification task in the last mlpconv layer. Instead of adding fully connected layers
on top of the feature maps, we take the average of each feature map, and the resulting vector is fed
directly into the softmax layer. One advantage of global average pooling over the fully connected
layers is that it is more native to the convolution structure by enforcing correspondences between
feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence
maps. Another advantage is that there is no parameter to optimize in the global average pooling
thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial
information, thus it is more robust to spatial translations of the input.
"

Sorry, I haven’t read that paper. Maybe the number of channels after the last convolution has to be equal to the number of outputs, do the authors mention this? Anyway, it’s very common to use global average pooling as a replacement for flatten before a fully connected later, see for instance this issue at keras’ github