How to replace FCC with GlobalPooling

I’ve written the network:

class ClassificationHead(nn.Module):
    """
    Classification of the image rotation angle.
    Args:
        input_size: the size of the input image
    """

    def __init__(self, input_size: int):
        super().__init__()
        self.fc1 = nn.Linear(input_size, 16)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.1)
        self.fc2 = nn.Linear(16, 4)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.dropout1(x)
        x = self.fc2(x)
        return x

But would like to replace the behaviour with a single GlobalPooling, I cant figure out how at all.

I assume you define GlobalPooling as a pooling operation returning an activation with a spatial size of 1x1?
If so, nn.AdaptiveAvgPool2d(output_size=(1, 1)) should work:

x = torch.randn(2, 3, 224, 224)
pool = nn.AdaptiveAvgPool2d(output_size=(1, 1))
out = pool(x)
print(out.shape)
# torch.Size([2, 3, 1, 1])
1 Like

I tried but was actually confused by the output. I guess just needs to squeeze it ?

If you want to remove the spatial dimensions with a size of 1, then yes you can squeeze() the output.

1 Like

Thank you. I’m still figuring out how to extract the maximum information without using FC layers but convs only.

I think the layer proposed in the post should work, but the channels dimension is still uncontrolled I think, not sure how to go about making it 4 or the number of classes.

thinking of this currently:

class ClassificationHead(nn.Module):
    """
    Classification of the image rotation angle.
    Args:
        input_size: the size of the input image
    """

    def __init__(self, input_size: int):
        super().__init__()
        # self.fc1 = nn.Linear(input_size, 24)
        # self.relu1 = nn.ReLU()
        # self.dropout1 = nn.Dropout(0.1)
        self.flatten = nn.Flatten()
        self.gap = nn.AdaptiveAvgPool2d((1, 1))
        self.fc2 = nn.Linear(input_size, 4)

Using an nn.Linear layer sounds like a valid option or you could also apply an nn.Conv2d layer before the pooling layer.