Help me rewrite this from tensorflow (Dense layers)

J_Johnson · November 3, 2023, 4:43pm

Welcome to the forums!

There is a lot to unpack here.

When you are calling nn.AdaptiveAvgPool2d(1), you are telling PyTorch that whatever size image comes into that, make it 1x1. So the output of that layer would then be (batch_size, channels, 1, 1). Let’s assume you want that. Then you should call a flatten() afterward.

_x = self.global_avg_pool(inputs).flatten(1)

When you get down to your convolutional layer, you will need to add those two dims back.

_x = self.conv(_x.unsqueeze(2).unsqueeze(3))

Note: The above operation, in your case, is the mathematical equivalent of a Linear layer under the guise of a convolution layer.

Let’s demonstrate that to be the case with the following code:

import torch
import torch.nn as nn


model1 = nn.Linear(3,4)

model2 = nn.Conv2d(3,4, kernel_size=1)

model2.weight.data = model1.weight.data.unsqueeze(2).unsqueeze(3)
model2.bias.data = model1.bias.data

dummy_data = torch.rand((2, 3))

outputs1 = model1(dummy_data)
outputs2 = model2(dummy_data.unsqueeze(2).unsqueeze(3)).squeeze(2).squeeze(2)
print(outputs1)
print(outputs2)
print(outputs1.allclose(outputs2))

TLDR you could just use a Linear layer and do the same thing. But just make sure to unsqueeze your dim 2 and 3 before the final matrix multiplication with the original inputs.