What is the default initialization of a conv2d layer and linear layer?

Hey guys, when I train models for an image classification task, I tried replace the pretrained model’s last fc layer with a nn.Linear layer and a nn.Conv2d layer(by setting kernel_size=1 to act as a fc layer) respectively and found that two models performs differently. Specifically the conv2d one always performs better on my task. I wonder if it is because the different initialization methods for the two layers and what’s the default initialization method for a conv2d layer and linear layer in PyTorch. Thank you in advance.

This is the initialization for linear:

And this is the initialization for conv:


Thank you richard. It seems that they are initialized samely when acting as fc layer. But I’m more confused about why one model performs better than the other.:confused:

Could you give some information regarding the input and output shape of the linear and conv layer?

The pretrained model’s feature map(after a avgpooling layer) is of shape (bs, 512, 1, 1),
so for nn.Conv2d layer, it should be
self.fc = nn.Conv2d(512, num_classes, kernel_size=1,stride=1, padding=0, bias=True)
when forward,

# input x is the feature map
x = self.fc(x)
out = x.view(x.size(0), -1)

for nn.Linear,
self.fc = nn.Linear(512, num_classes, bias=True)
when forward,
x = self.fc(x.view(x.size(0), -1))

This looks good. Is the “conv model” performing better in every run?
You could use different seeds and check if it’s a random issue or systematic.

At the moment I don’t see any reason the conv layer should perform better than the linear layer.

Yeah, I also think it’s by coincidence. Anyway I’ll do several further experiment and double check my code.:neutral_face:

Hi, is the initialization for conv2d xavier initialization?

Hi, have you solved this question?