import torch
from torch import nn
from torch.quantization import quantize_dynamic
class UnifiedModel(nn.Module):
def __init__(self):
super(UnifiedModel, self).__init__()
self.linear1 = nn.Linear(15, 10)
self.conv = nn.Conv1d(10, 10, 1)
self.linear2 = nn.Linear(10, 5)
def forward(self, x):
x = self.linear2(self.conv(self.linear1(x).transpose(-1, -2)).transpose(-1, -2))
return x
model = UnifiedModel()
# dynamic_quantization
quantized_model = torch.quantization.quantize_dynamic(
model, {nn.LSTM, nn.Linear}, dtype=torch.float16
)
While calculating parameters for both quantized and non-quantized:
print("Number of parameters before:", sum(p.numel() for p in model.parameters()))
print("Number of parameters before:", sum(p.numel() for p in quantized_model.parameters()))
Results:
Number of parameters before: 325
Number of parameters before: 110
My intuition was that the dynamic quantization supports for linear and LSTM layer. so, in this case, linear layer performs operation in float16, which is the case
print(quantized_model)
Result:
UnifiedModel(
(linear1): DynamicQuantizedLinear(in_features=15, out_features=10, dtype=torch.float16)
(conv): Conv1d(10, 10, kernel_size=(1,), stride=(1,))
(linear2): DynamicQuantizedLinear(in_features=10, out_features=5, dtype=torch.float16)
)
But why the parameters of the quantized model decreases in this case? Aren’t we just decreasing the precision type.
also while printing parameters:
Non-Quantized Model:
for name, param in model.named_parameters():
print(name, param.shape)
Result:
linear1.weight torch.Size([10, 15])
linear1.bias torch.Size([10])
conv.weight torch.Size([10, 10, 1])
conv.bias torch.Size([10])
linear2.weight torch.Size([5, 10])
linear2.bias torch.Size([5])
For Quantized Model:
for name, param in quantized_model.named_parameters():
print(name, param.shape)
result:
conv.weight torch.Size([10, 10, 1])
conv.bias torch.Size([10])
In quantized model, there is no any parameters for linear layers. also, while infering where does it store its parameters?