Hello! I have a pre-trained pruned network with 75% sparsity.
I would like to apply quantization to this network such that its sparsity is maintained during inference. I’ve opted to use symmetric quantization for this, and it’s my understanding that the zero point should be 0. However, I get zero_point=128
. I place below a snippet of my code:
model.eval()
model.to('cpu')
quantization_config = torch.quantization.QConfig(activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.quint8, qscheme=torch.per_tensor_symmetric), weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric))
model.qconfig = quantization_config
quant_model = torch.quantization.prepare(model)
calibrate(quant_model, train_loader, batches_per_epoch)
When printing quant_model
this is the output:
VGGQuant(
(features): Sequential(
(0): QuantizedConv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.07883524149656296, zero_point=128, padding=(1, 1))
(1): QuantizedBatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): QuantizedConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.05492561683058739, zero_point=128, padding=(1, 1))
(4): QuantizedBatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(7): QuantizedConv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), scale=0.05388055741786957, zero_point=128, padding=(1, 1))
(8): QuantizedBatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(9): ReLU(inplace=True)
(10): QuantizedConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), scale=0.03040805645287037, zero_point=128, padding=(1, 1))
(11): QuantizedBatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(12): ReLU(inplace=True)
(13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(14): QuantizedConv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), scale=0.023659387603402138, zero_point=128, padding=(1, 1))
(15): QuantizedBatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(16): ReLU(inplace=True)
(17): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=0.01725710742175579, zero_point=128, padding=(1, 1))
(18): QuantizedBatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): ReLU(inplace=True)
(20): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=0.013385827653110027, zero_point=128, padding=(1, 1))
(21): QuantizedBatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(22): ReLU(inplace=True)
(23): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=0.011628611013293266, zero_point=128, padding=(1, 1))
(24): QuantizedBatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(25): ReLU(inplace=True)
(26): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(27): QuantizedConv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.00966070219874382, zero_point=128, padding=(1, 1))
(28): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(29): ReLU(inplace=True)
(30): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.006910551339387894, zero_point=128, padding=(1, 1))
(31): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(32): ReLU(inplace=True)
(33): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.002619387349113822, zero_point=128, padding=(1, 1))
(34): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(35): ReLU(inplace=True)
(36): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.002502179006114602, zero_point=128, padding=(1, 1))
(37): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(38): ReLU(inplace=True)
(39): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(40): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.00118942407425493, zero_point=128, padding=(1, 1))
(41): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(42): ReLU(inplace=True)
(43): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.0017956980736926198, zero_point=128, padding=(1, 1))
(44): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(45): ReLU(inplace=True)
(46): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.0021184098441153765, zero_point=128, padding=(1, 1))
(47): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(48): ReLU(inplace=True)
(49): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=0.0019303301814943552, zero_point=128, padding=(1, 1))
(50): QuantizedBatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(51): ReLU(inplace=True)
(52): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(53): AvgPool2d(kernel_size=1, stride=1, padding=0)
)
(classifier): QuantizedLinear(in_features=512, out_features=10, scale=0.0953117236495018, zero_point=128, qscheme=torch.per_tensor_affine)
(quant): Quantize(scale=tensor([0.0216]), zero_point=tensor([128]), dtype=torch.quint8)
(dequant): DeQuantize()
Should I use a different quantization scheme? Is there something I’m missing? I’d like for zero point to be 0 for all layers.