Hi, I like to selectively quantize layers as some layers in my project just serve as a regularizer. So, I tried a few ways and got confused with the following results.
class LeNet(nn.Module): def __init__(self): super().__init__() self.l1 = nn.Linear(28 * 28, 10) self.relu1 = nn.ReLU(inplace=True) def forward(self, x): return self.relu1(self.l1(x.view(x.size(0), -1)))
1. selective qconfig assignment and top level transform
model = LeNet() model.l1.qconfig = torch.quantization.get_default_qat_qconfig() torch.quantization.prepare_qat(model, inplace=True) print(model)
2. selective qconfig assignment and selective transform
model2 = LeNet() model2.l1.qconfig = torch.quantization.get_default_qat_qconfig() torch.quantization.prepare_qat(model2.l1, inplace=True) print(model2)
You can see that the 2nd case doesn’t have (weight_fake_quant): FakeQuantize. Is this a correct behavior? Shouldn’t both yield the same transformed model?
Also, if there is a better way to do selective quantization (like different bits, quant vs no quant), please advise.