I assume you define GlobalPooling as a pooling operation returning an activation with a spatial size of 1x1?
If so, nn.AdaptiveAvgPool2d(output_size=(1, 1)) should work:
x = torch.randn(2, 3, 224, 224)
pool = nn.AdaptiveAvgPool2d(output_size=(1, 1))
out = pool(x)
print(out.shape)
# torch.Size([2, 3, 1, 1])
Thank you. I’m still figuring out how to extract the maximum information without using FC layers but convs only.
I think the layer proposed in the post should work, but the channels dimension is still uncontrolled I think, not sure how to go about making it 4 or the number of classes.
thinking of this currently:
class ClassificationHead(nn.Module):
"""
Classification of the image rotation angle.
Args:
input_size: the size of the input image
"""
def __init__(self, input_size: int):
super().__init__()
# self.fc1 = nn.Linear(input_size, 24)
# self.relu1 = nn.ReLU()
# self.dropout1 = nn.Dropout(0.1)
self.flatten = nn.Flatten()
self.gap = nn.AdaptiveAvgPool2d((1, 1))
self.fc2 = nn.Linear(input_size, 4)