Welcome to the forums!
There is a lot to unpack here.
- When you are calling
nn.AdaptiveAvgPool2d(1)
, you are telling PyTorch that whatever size image comes into that, make it 1x1. So the output of that layer would then be (batch_size, channels, 1, 1). Let’s assume you want that. Then you should call a flatten() afterward.
_x = self.global_avg_pool(inputs).flatten(1)
- When you get down to your convolutional layer, you will need to add those two dims back.
_x = self.conv(_x.unsqueeze(2).unsqueeze(3))
Note: The above operation, in your case, is the mathematical equivalent of a Linear layer under the guise of a convolution layer.
Let’s demonstrate that to be the case with the following code:
import torch
import torch.nn as nn
model1 = nn.Linear(3,4)
model2 = nn.Conv2d(3,4, kernel_size=1)
model2.weight.data = model1.weight.data.unsqueeze(2).unsqueeze(3)
model2.bias.data = model1.bias.data
dummy_data = torch.rand((2, 3))
outputs1 = model1(dummy_data)
outputs2 = model2(dummy_data.unsqueeze(2).unsqueeze(3)).squeeze(2).squeeze(2)
print(outputs1)
print(outputs2)
print(outputs1.allclose(outputs2))
TLDR you could just use a Linear layer and do the same thing. But just make sure to unsqueeze your dim 2 and 3 before the final matrix multiplication with the original inputs.