I am currently experimenting with pretrained SwinTransformer models from torchvision and check my model structure by torchinfo. The code is as follows:
from torchvision.models import swin_t, Swin_T_Weights
import torch.nn as nn
from torchinfo import summary
model = swin_t(weights=Swin_T_Weights.DEFAULT)
features = nn.Sequential(*list(model.children())[:-1])
print(summary(model, input_size=(4, 3, 224, 224)))
print(summary(features, input_size=(4, 3, 224, 224))).
The output from torchinfo is a bit strange. These are the results for the last few layers. I believe the output shape of AdaptiveAvgPool2d in the feature module should also be [4, 768, 1, 1], but instead, it is [4, 7, 1, 1]. Can you tell me what’s wrong with my code?
# model
├─LayerNorm: 1-2 [4, 7, 7, 768] 1,536
├─AdaptiveAvgPool2d: 1-3 [4, 768, 1, 1] --
├─Linear: 1-4 [4, 1000] 769,000
# features
├─LayerNorm: 1-2 [4, 7, 7, 768] 1,536
├─AdaptiveAvgPool2d: 1-3 [4, 7, 1, 1] --