Having different number of in_features and out_feature in nn.Linear?

I understand that the out_features in Linear are often lower than the in_features to get more meaningful features but sometimes I see the out_features is higher than the in_features, sometimes it’s equal.

I noticed in the VGG19, that the first two of the three last FC layers have the same 4096 channel.

In swin-transformer we have:

Sequential(
      (0): SwinTransformerBlockV2(
        (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): ShiftedWindowAttentionV2(
          (qkv): Linear(in_features=768, out_features=2304, bias=True) #Higher
          (proj): Linear(in_features=768, out_features=768, bias=True) #Equal
          (cpb_mlp): Sequential(
            (0): Linear(in_features=2, out_features=512, bias=True)
            (1): ReLU(inplace=True)
            (2): Linear(in_features=512, out_features=24, bias=False) 
          )

Does having this help anything to the networks?
I want to ask:

  1. What are the purposes of having higher, equal, lower out_features in the network?
  2. Can you provide me with some papers about this matter and network architectures having this?

I have done some experiments on the last layers of my custom network by having higher out_features than in_feature, followed by the equal, then followed with lower for the output. It gives better results sometimes

Hi @luan_nguy_n

That is an interesting question.

So generally we have a lower to higher feature when we want the model to increase its learning capacity and learn more patterns.

We have equal features, when we just want a linear transformation of features, without actually changing dimensionality.

We have Higher to lower features when we want to lose unwanted information or features. This helps in speeding up computation and avoid overfitting.

1 Like

Thank you for your answer @Aniruth_Sundararaja1 .
I experimented with my custom network and it gave me better results, quite curious about why it does. My hypothesis is the same as yours, but finding any research talks about it really hard.