I’m trying to build a customized quantizer, which takes a FP32 model as input and then output a quantized model, basically I just need to quantize all convolution layers, and I got few questions.
- Should I replace all
nn.Conv2d
modules tonn.quantized.Conv2d
modules? - If I set a
nn.Conv2d
module’s weight to dtypetorch.int8
(nottorch.qint8
) and input a tensor of dtypetorch.int8
and remove bias, does it run like ann.quantized.Conv2d
?