torchsummary
is deprecated as it wasn’t updated in a couple of years as seen in their repository. Use torchinfo
instead which works for me after adding the missing batch dimension:
import torchvision
from torchinfo import summary
model = torchvision.models.vit_b_16(weights='IMAGENET1K_SWAG_E2E_V1')
summary(model,input_size= (1, 3, 384, 384))
# ===============================================================================================
# Layer (type:depth-idx) Output Shape Param #
# ===============================================================================================
# VisionTransformer [1, 1000] 768
# ├─Conv2d: 1-1 [1, 768, 24, 24] 590,592
# ├─Encoder: 1-2 [1, 577, 768] 443,136
# │ └─Dropout: 2-1 [1, 577, 768] --
# │ └─Sequential: 2-2 [1, 577, 768] --
# │ │ └─EncoderBlock: 3-1 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-2 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-3 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-4 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-5 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-6 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-7 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-8 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-9 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-10 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-11 [1, 577, 768] 7,087,872
# │ │ └─EncoderBlock: 3-12 [1, 577, 768] 7,087,872
# │ └─LayerNorm: 2-3 [1, 577, 768] 1,536
# ├─Sequential: 1-3 [1, 1000] --
# │ └─Linear: 2-4 [1, 1000] 769,000
# ===============================================================================================
# Total params: 86,859,496
# Trainable params: 86,859,496
# Non-trainable params: 0
# Total mult-adds (M): 397.66
# ===============================================================================================
# Input size (MB): 1.77
# Forward/backward pass size (MB): 304.88
# Params size (MB): 232.27
# Estimated Total Size (MB): 538.92
# ===============================================================================================