Decrease in model parameters in dynamic quantization

import torch
from torch import nn
from torch.quantization import quantize_dynamic

class UnifiedModel(nn.Module):
    def __init__(self):
        super(UnifiedModel, self).__init__()
        self.linear1 = nn.Linear(15, 10)
        self.conv = nn.Conv1d(10, 10, 1)
        self.linear2 = nn.Linear(10, 5)

    def forward(self, x):
        x = self.linear2(self.conv(self.linear1(x).transpose(-1, -2)).transpose(-1, -2))
        return x

model = UnifiedModel()


# dynamic_quantization
quantized_model = torch.quantization.quantize_dynamic(
        model, {nn.LSTM, nn.Linear}, dtype=torch.float16
    )

While calculating parameters for both quantized and non-quantized:

print("Number of parameters before:", sum(p.numel() for p in model.parameters()))
print("Number of parameters before:", sum(p.numel() for p in quantized_model.parameters()))

Results:

Number of parameters before: 325
Number of parameters before: 110

My intuition was that the dynamic quantization supports for linear and LSTM layer. so, in this case, linear layer performs operation in float16, which is the case

print(quantized_model)

Result:

UnifiedModel(
  (linear1): DynamicQuantizedLinear(in_features=15, out_features=10, dtype=torch.float16)
  (conv): Conv1d(10, 10, kernel_size=(1,), stride=(1,))
  (linear2): DynamicQuantizedLinear(in_features=10, out_features=5, dtype=torch.float16)
)

But why the parameters of the quantized model decreases in this case? Aren’t we just decreasing the precision type.
also while printing parameters:
Non-Quantized Model:

for name, param in model.named_parameters():
    print(name, param.shape)

Result:

linear1.weight torch.Size([10, 15])
linear1.bias torch.Size([10])
conv.weight torch.Size([10, 10, 1])
conv.bias torch.Size([10])
linear2.weight torch.Size([5, 10])
linear2.bias torch.Size([5])

For Quantized Model:

for name, param in quantized_model.named_parameters():
    print(name, param.shape)

result:

conv.weight torch.Size([10, 10, 1])
conv.bias torch.Size([10])

In quantized model, there is no any parameters for linear layers. also, while infering where does it store its parameters?

I guess you are counting the parameters of the quantized model wrong as quantized parameters might not show up in .parameters() anymore.

Oh ok! @ptrblck But how it is saved internally?
if i were to find the model size in memory while loading:

def print_model_size(model):
    model_size_bytes = sum(p.numel() * p.element_size() for p in model.parameters()) + \
                    sum(b.numel() * b.element_size() for b in model.buffers())

    model_size = model_size_bytes / (1024**2)
    return model_size

then for dynamic quantized model, the size for quantized params will not be calculated.
Then what will be the best approach for calculating model size for quantized model in this case?

Here is a tutorial with a bit at the end about calculating model size

https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html