Can I see what you are trying to do? The parameters should not be empty unless you have something like:

```
class Model(nn.Module):
def __init__(self):
super(model, self).__init__()
model = Model()
```

The above model has no parameters.

Can I see what you are trying to do? The parameters should not be empty unless you have something like:

```
class Model(nn.Module):
def __init__(self):
super(model, self).__init__()
model = Model()
```

The above model has no parameters.

you can see it here: How does one make sure that the custom NN has parameters?

```
class NN(torch.nn.Module):
def __init__(self, D_layers,act,w_inits,b_inits,bias=True):
super(type(self), self).__init__()
# actiaction func
self.act = act
#create linear layers
self.linear_layers = [None]
for d in range(1,len(D_layers)):
linear_layer = torch.nn.Linear(D_layers[d-1], D_layers[d],bias=bias)
self.linear_layers.append(linear_layer)
```

I posted my response on your original question!

1 Like

```
def get_n_params(model):
pp=0
for p in list(model.parameters()):
nn=1
for s in list(p.size()):
nn = nn*s
pp += nn
return pp
```

10 Likes

To compute the number of *trainable* parameters:

```
model_parameters = filter(lambda p: p.requires_grad, model.parameters())
params = sum([np.prod(p.size()) for p in model_parameters])
```

37 Likes

I like this solution!

To add my 50 cents, I would use `numel()`

instad of `np.prod()`

and compress the expression in one line:

```
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
```

87 Likes

Provided the models are similar in keras and pytorch, the number of trainable parameters returned are different in pytorch and keras.

import torch

import torchvision

from torch import nn

from torchvision import models

a= models.resnet50(pretrained=False)

a.fc = nn.Linear(512,2)

count = count_parameters(a)

print (count)

**23509058**

Now in keras

import keras.applications.resnet50 as resnet

model =resnet.ResNet50(include_top=True, weights=None, input_tensor=None, input_shape=None, pooling=None, classes=2)

print model.summary()

**Total params: 23,591,810**

**Trainable params: 23,538,690**

**Non-trainable params: 53,120**

Any reasons why this difference in numbers pop up?

2 Likes

Hi Alex, well spotted. I never did this comparison.

One easy check it to compare the layers one by one, (Linear, Conv2d, BatchNorm etc.), and see if there’s any difference in the number of params.

However, I don’t think there will be any difference, provided that you pay attention to the sneaky default parameters.

After that, you can patiently compare the graphs layer by layer and see if you spot any difference. Maybe it’s a matter of omitted/shared biases in some of the layers.

Btw, the first test is also a good check for the `count_parameters()`

function, let us now if you discover some unexpected behavior

Have you checked if they are the bias weights?

I guess this counts shared parameters multiple times, doesn’t it?

```
import torch
from models.modelparts import count_parameters
class tstModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.p = torch.nn.Parameter(
torch.randn(1, 1, 1, requires_grad=True)
.expand(-1, 5, -1)
)
print(count_parameters(tstModel()))
```

prints `5`

If I understand correctly, expand just creates tensor with 5 views to the same parameter, so the right answer should be `1`

.

But I don’t know how to fix that.

1 Like

did anyone figure out a solution for shared parameters?

So I get that by default, `Conv2d`

includes the bias. But I’m unclear as to why they (the biases) are being included in ‘requires_grad’.

`In [1]: conv_3 = nn.Conv2d(512, 256, kernel_size=3, bias=True)`

`In [2]: sum(p.numel() for p in conv_3.parameters())`

`Out[2]: 1179904`

`In [3]: sum(p.numel() for p in conv_3.parameters() if p.requires_grad)`

`Out[3]: 1179904`

1 Like

The `bias`

is a trainable parameter, which requires gradients and is optimized in the same way as the `weight`

parameter.

Do you have a use case, where the bias is fixed to a specific value?

Ah sorry. It was a conceptual error on my part. I had confused the idea of bias being a constant value **with a weight** with bias being a constant value.

Thanks for the clarification.

just out of curiosity, is there a `np.prod`

for pytorch?

Well, there’s `torch.prod`

, but unlike numpy it accepts only tensors and does not accept tuples, lists, etc.

1 Like

For finding the total number of parameter elements (if you are interested in the total size of the parameter space rather than the number of parameter tensors), I use `sum(p.numel() for p in model.parameters())`

1 Like

@hughperkins Is this stack overflow answer a reasonable way to handle not double-counting shared parameters?

```
from prettytable import PrettyTable
def count_parameters(model):
table = PrettyTable(["Modules", "Parameters"])
total_params = 0
for name, parameter in model.named_parameters():
if not parameter.requires_grad:
continue
param = parameter.numel()
table.add_row([name, param])
total_params+=param
print(table)
print(f"Total Trainable Params: {total_params}")
return total_params
count_parameters(model)
```