Hi,
I am experimenting with simulating of different bit levels of quantization for my QAT on ResNet18.
I set up a custom qconfig for selected modules when initializing the model e.g.:
self.qconfig = quant.get_default_qat_qconfig("x86")
...
# activation_bits and weight_bits loaded from config file
custom_qconfig = quant.QConfig(
activation = quant.MinMaxObserver.with_args(
quant_min=0,
quant_max=(2**activation_bits - 1) // 2,
dtype=torch.quint8,
qscheme=torch.per_tensor_affine,
),
weight=weight_observer.with_args(
quant_min=-((2 ** (weight_bits - 1))) // 2,
quant_max=(2 ** (weight_bits - 1) - 1) // 2,
dtype=torch.qint8,
qscheme=weight_scheme,
),
)
Then I follow the steps from QAT tutorial, hence running fuse_modules_qat
and prepare_qat
functions. But after this step it feels like that my changes to quant_min/max are not being propagated into the fused module with whatever changes I do to min/max args. Maybe they are not supposed to be, since those modules are getting the attributes from the default self.qconfig = quant.get_default_qat_qconfig("x86")
.
# output of print(model.conv1)
ConvBnReLU2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(weight_fake_quant): FusedMovingAvgObsFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0'), observer_enabled=tensor([1], device='cuda:0'), scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False
(activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([], device='cuda:0'), max_val=tensor([], device='cuda:0'))
)
(activation_post_process): FusedMovingAvgObsFakeQuantize(
fake_quant_enabled=tensor([1], device='cuda:0'), observer_enabled=tensor([1], device='cuda:0'), scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True
(activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
)
)
My question is, if there is any way to do that, since I would like to have different levels of “quantization” on each layer.
Maybe I am just missing something as I am still new into this so any help is appreciated!
Thanks in advance!