Model.parameters overestimating parameter count

I wanted to compare my model to RESNET in terms of number of parameters. RESNET had 22M…

using the statement

pytorch_total_params = sum(p.numel() for p in   model.parameters() if p.requires_grad)

I got 129M! I am using a basic UNET with feature layers [64,128,256,512,1024]. Why is it so large?

class UNet(nn.Module):
    def __init__(self, in_channels=3, out_channels=1, features=[64, 128, 256, 512, 1024],):
        super(UNet, self).__init__() = nn.ModuleList()
        self.downs = nn.ModuleList()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Down part of UNET
        for feature in features:
            self.downs.append(DoubleConv(in_channels, feature))
            in_channels = feature

        # Up part of UNET
        for feature in reversed(features):
                    feature*2, feature, kernel_size=2, stride=2,
  *2, feature))

        self.bottleneck = DoubleConv(features[-1], features[-1]*2)
        self.final_conv = nn.Conv2d(features[0], out_channels, kernel_size=1)

    def forward(self, x):
        skip_connections = []

        for down in self.downs:
            x = down(x)
            x = self.pool(x)

        x = self.bottleneck(x)
        skip_connections = skip_connections[::-1]

        for idx in range(0, len(, 2):
            x =[idx](x)
            skip_connection = skip_connections[idx//2]

            if x.shape != skip_connection.shape:
                x = TF.resize(x, size=skip_connection.shape[2:])

            concat_skip =, x), dim=1)
            x =[idx+1](concat_skip)

        return self.final_conv(x)

Could you post the definition of the missing modules, please?

@ptrblck See below. In the literature people have 30M as the parameter size for the architecture and I am way above that unless I am missing something.

class DoubleConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(DoubleConv, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, 1, 1, bias=False),
            nn.Conv2d(out_channels, out_channels, 3, 1, 1, bias=False),

    def forward(self, x):
        return self.conv(x)

The parameter count seems to be expected and you can take a look at some large layers via:

model = UNet()

nb_params = 0
for name, param in model.named_parameters():
    print("parameter {} contains {} elements".format(name, param.nelement()))
    nb_params += param.nelement()
print(nb_params / 1e6)
# 124.374209

# torch.Size([1024, 2048, 3, 3])
# torch.Size([2048, 1024, 3, 3])
# torch.Size([2048, 2048, 3, 3])

They might have changed the architecture as the last printed conv layer alone contains ~37.75 million parameters:

model.bottleneck.conv[3].weight.nelement() / 1e6
# 37.748736
1 Like

Thanks for info! Could it be the fact that I am doing multi-class vs binary segmentation? If I divide by 3 I get into the ball park

No, I don’t think so as the number of classes would be represented in the out_channels which is only used in self.final_conv while the majority of parameters are stored in the ups and bottleneck layers, which do not contain any information about the number of classes.