AttributeError: NoneType object has no attribute size

dirkcremers · April 6, 2023, 3:43pm

Hi,

I am trying to get a summary of the pre-trained model by using the following code:

import torchvision
from torchsummary import summary
model = torchvision.models.vit_b_16(weights=‘IMAGENET1K_SWAG_E2E_V1’)
summary(model,input_size= (3, 384, 384))

However, I do not understand the error output which is provided. Can somebody maybe explain what is going wrong, since the input sizes seem to match with the vision transformer. Doing something like this for ResNet works and provides a detailed description of the model.

Error stack trace:

/usr/local/lib/python3.9/dist-packages/torchsummary/torchsummary.py in (.0)
21 if isinstance(output, (list, tuple)):
22 summary[m_key][“output_shape”] = [
—> 23 [-1] + list(o.size())[1:] for o in output
24 ]
25 else:

Thanks for helping out!

Dirk

ptrblck · April 6, 2023, 11:10pm

torchsummary is deprecated as it wasn’t updated in a couple of years as seen in their repository. Use torchinfo instead which works for me after adding the missing batch dimension:

import torchvision
from torchinfo import summary

model = torchvision.models.vit_b_16(weights='IMAGENET1K_SWAG_E2E_V1')
summary(model,input_size= (1, 3, 384, 384))

# ===============================================================================================
# Layer (type:depth-idx)                        Output Shape              Param #
# ===============================================================================================
# VisionTransformer                             [1, 1000]                 768
# ├─Conv2d: 1-1                                 [1, 768, 24, 24]          590,592
# ├─Encoder: 1-2                                [1, 577, 768]             443,136
# │    └─Dropout: 2-1                           [1, 577, 768]             --
# │    └─Sequential: 2-2                        [1, 577, 768]             --
# │    │    └─EncoderBlock: 3-1                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-2                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-3                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-4                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-5                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-6                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-7                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-8                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-9                 [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-10                [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-11                [1, 577, 768]             7,087,872
# │    │    └─EncoderBlock: 3-12                [1, 577, 768]             7,087,872
# │    └─LayerNorm: 2-3                         [1, 577, 768]             1,536
# ├─Sequential: 1-3                             [1, 1000]                 --
# │    └─Linear: 2-4                            [1, 1000]                 769,000
# ===============================================================================================
# Total params: 86,859,496
# Trainable params: 86,859,496
# Non-trainable params: 0
# Total mult-adds (M): 397.66
# ===============================================================================================
# Input size (MB): 1.77
# Forward/backward pass size (MB): 304.88
# Params size (MB): 232.27
# Estimated Total Size (MB): 538.92
# ===============================================================================================