How do you calculate the number of activations for :
l = nn.Linear(10, 100)
c = nn.Conv2d(1, 2, kernel_size=(3,4))
I would say 1000 for l
, if this is the number of weights. But I am not sure.
How do you calculate the number of activations for :
l = nn.Linear(10, 100)
c = nn.Conv2d(1, 2, kernel_size=(3,4))
I would say 1000 for l
, if this is the number of weights. But I am not sure.
The linear layer is pretty straightforward, as the output activation will have the shape [batch_size, *, out_features]
.
The activation of the conv layer is a bit trickier, as the shape depend on the spatial size of your input:
x = torch.randn(2, 10)
out = l(x)
print(out.nelement()) # batch_size * out_features
> 200
x = torch.randn(2, 1, 24, 24)
out = c(x)
print(out.nelement()) # batch_size * out_channels * (h - (kW - 1)) * (w - (kW - 1))
> 1848
print(out.shape)
> torch.Size([2, 2, 22, 21])
Unbelievable, I never though this calculation will be so interesting.
Shows that even when we involve padding, stride, and dilation it really becomes complex:
c = nn.Conv2d(1, 2, kernel_size=(3,4), padding=1, stride=2, dilation=2)
I see now that bs
has key role, but I wonder why only w
is taken in account.
Why b
part is ignored?
(I know in some cases b
can be ignored.)
You can find the actual computation for the output shape in the docs.
Iām a bit confused. Are you looking for the number of parameters (weight + bias) or the output activation shape?
I was unaware what number of activations mean till now.
In case you would like to count the number of parameters, you could use:
lin_params = 0
for param in l.parameters():
lin_params += param.nelement()
conv_params = 0
for param in c.parameters():
conv_params += param.nelement()
Tested and it works great. Thanks.