They take weights into account, bias are not included, for exact calculation, bias should be included for memory size calculation, is that right?

Yes, for the exact calculation you could add the `bias`

shape, but it can be skipped as it’s usually a small overhead.

E.g. the first conv layer:

```
conv = nn.Conv2d(3, 64, 3, padding=1)
```

would create an activation shape of `[batch_size, 64, 224, 224] = 3211264*batch_size`

elements, has a `weight`

of `[64, 3, 3, 3] = 1728`

elements and a bias of `[64] = 64`

elements.

Given this large difference you can decide if the exact calculation would yield any important information.

Note that the later layers have a larger difference of `weight/bias`

.

E.g.:

```
conv = nn.Conv2d(256, 512, 3)
print(conv.weight.nelement())
# 1179648
print(conv.bias.nelement())
# 512
print(conv.bias.nelement()/conv.weight.nelement())
# 0.00043402777777777775
```

it is so convenient to compute size of weights and bias by using Pytorch, great