Simulating quantization to lower bit precision with quant_min/max setting on fused modules

Hi,
I am experimenting with simulating of different bit levels of quantization for my QAT on ResNet18.
I set up a custom qconfig for selected modules when initializing the model e.g.:

                self.qconfig = quant.get_default_qat_qconfig("x86")
                ...
                # activation_bits and weight_bits loaded from config file
                custom_qconfig = quant.QConfig(
                    activation = quant.MinMaxObserver.with_args(
                        quant_min=0,
                        quant_max=(2**activation_bits - 1) // 2,
                        dtype=torch.quint8,
                        qscheme=torch.per_tensor_affine,
                    ),
                    weight=weight_observer.with_args(
                        quant_min=-((2 ** (weight_bits - 1))) // 2,
                        quant_max=(2 ** (weight_bits - 1) - 1) // 2,
                        dtype=torch.qint8,
                        qscheme=weight_scheme,
                    ),
                )

Then I follow the steps from QAT tutorial, hence running fuse_modules_qat and prepare_qat functions. But after this step it feels like that my changes to quant_min/max are not being propagated into the fused module with whatever changes I do to min/max args. Maybe they are not supposed to be, since those modules are getting the attributes from the default self.qconfig = quant.get_default_qat_qconfig("x86").

# output of print(model.conv1)

ConvBnReLU2d(
  3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
  (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (weight_fake_quant): FusedMovingAvgObsFakeQuantize(
    fake_quant_enabled=tensor([1], device='cuda:0'), observer_enabled=tensor([1], device='cuda:0'), scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False
    (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([], device='cuda:0'), max_val=tensor([], device='cuda:0'))
  )
  (activation_post_process): FusedMovingAvgObsFakeQuantize(
    fake_quant_enabled=tensor([1], device='cuda:0'), observer_enabled=tensor([1], device='cuda:0'), scale=tensor([1.], device='cuda:0'), zero_point=tensor([0], device='cuda:0', dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True
    (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
  )
)

My question is, if there is any way to do that, since I would like to have different levels of “quantization” on each layer.
Maybe I am just missing something as I am still new into this so any help is appreciated!

Thanks in advance!