I am trying to use global average pooling, however I have no idea on how to implement this in pytorch. So global average pooling is described briefly as:

It means that if you have a 3D 8,8,128 tensor at the end of your last convolution, in the traditional method, you flatten it into a 1D vector of size 8x8x128. And you then add one or several fully connected layers and then at the end, a softmax layer that reduces the size to 10 classification categories and applies the softmax operator.

The global average pooling means that you have a 3D 8,8,10 tensor and compute the average over the 8,8 slices, you end up with a 3D tensor of shape 1,1,10 that you reshape into a 1D vector of shape 10. And then you add a softmax operator without any operation in between. The tensor before the average pooling is supposed to have as many channels as your model has classification categories.

So, to me it sounds like in one case you do

```
tensor = tensor.view(8*8*10, 10)
tensor = self.Linear(tensor) # size 10
tensor = self.Softmax(tensor)
```

and, in the other, you do

```
tensor = self.Conv2d(output_size = 10, kernel_size=1) #to get [10x8x8] size
tensor = self.GovalAvgPooling(tensor) #whatever this is , to get [10, 1, 1]
tensor = self.Squeeze_Dims(tensor) # to just get a vector [10]
tensor = self.Softmax(tensor)
```

Here are the questions:

- Are the above examples correct, keeping in mind the description of global average pooling?
- How can I do the global average pooling? Should I use the functional module?
- The paper I am trying to reproduce (residual nets) says that:

The network ends with a global average pooling, a 10-way fully-connected layer, and softmax.

But this does not make sense ?? Why do they need the 10-way fc layer?