Hi @jerryzh168, I tried graph mode quantization test cases in test_quantization.py. In test_conv
, I dumped the quantized IR like so:
model_quantized = quantize_script(
model_under_test,
qconfig_dict,
default_eval_fn,
[self.img_data],
inplace=False)
self.assertEqual(model_quantized(self.img_data[0][0]), result_eager)
torch._C._jit_pass_inline(model_quantized.graph)
print(model_quantized.graph)
But the output is confusing. There are some quantization related operators inserted, but they are just doing quantize/dequantize round trip and the actual convolution is done in fp32.
graph(%self.1 : __torch__.torch.nn.modules.module.___torch_mangle_7.Module,
%input : Float(2, 3, 10, 10)):
%2 : __torch__.torch.nn.modules.module.___torch_mangle_8.Module = prim::GetAttr[name="conv"](%self.1)
%4 : float = prim::GetAttr[name="input.1_scale"](%2)
%5 : int = prim::GetAttr[name="input.1_zero_point"](%2)
%6 : int = prim::GetAttr[name="input.1_scalar_type"](%2)
%input.1.quant : Tensor = aten::quantize_per_tensor(%input, %4, %5, %6)
%input.1.dequant.0 : Tensor = aten::dequantize(%input.1.quant)
%9 : bool = prim::Constant[value=1](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
%10 : bool = prim::Constant[value=0](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
%11 : int = prim::Constant[value=1](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
%12 : None = prim::Constant(), scope: __module.conv
%13 : Tensor = prim::GetAttr[name="weight"](%2)
%14 : float = prim::GetAttr[name="8_scale"](%2)
%15 : int = prim::GetAttr[name="8_zero_point"](%2)
%16 : int = prim::GetAttr[name="8_scalar_type"](%2)
%8.quant : Tensor = aten::quantize_per_tensor(%13, %14, %15, %16)
%18 : int[] = prim::Constant[value=[1, 1]]()
%19 : int[] = prim::Constant[value=[0, 0]]()
%20 : int[] = prim::Constant[value=[1, 1]]()
%21 : int[] = prim::Constant[value=[0, 0]]()
%22 : Tensor = quantized::conv2d_prepack(%8.quant, %12, %18, %19, %20, %11)
%23 : Tensor, %24 : Tensor? = quantized::conv2d_unpack(%22)
%25 : Tensor = aten::dequantize(%23)
%26 : Tensor = aten::conv2d(%input.1.dequant.0, %25, %24, %18, %19, %20, %11)
%27 : float = prim::GetAttr[name="15_scale"](%2)
%28 : int = prim::GetAttr[name="15_zero_point"](%2)
%29 : int = prim::GetAttr[name="15_scalar_type"](%2)
%15.quant : Tensor = aten::quantize_per_tensor(%26, %27, %28, %29)
%15.dequant.0 : Tensor = aten::dequantize(%15.quant)
return (%15.dequant.0)
Is this expected? I was hoping to see quantized::conv2d
op there.
I’m using torch built from source and the commit is at:
In [2]: torch.__version__
Out[2]: '1.5.0a0+c75d06d'