Current status of automatic quantization support

Hi, I remember from one of the talks at the last dev con or elsewhere that PyTorch v1.4 would ship with support for automatic quantization of jitted models. v1.4 is out now, but it is not clear from looking at the release note whether this feature is official.

Is automatic quantization ready for experimental use? Can it already quantize an entire network from torchvision for example?

Thanks
masa

Yes, we are very close to having graph mode ready, there are still a few changes that havent been completed yet. (cc @jerryzh168 for updates)

1 Like

Yes, graph mode quantization will be ready for testing after https://github.com/pytorch/pytorch/pull/32303 lands.
Next we are going to test more models and fix the issues we see in the process, this might take a few more weeks to a month or so, but in the meantime, feel free to checkout the master and test the graph mode quantization on your model as well.
I’ll reply here when this is ready for trying out.

2 Likes

Thanks for the replies. I’m already using the master build so I’ll try it out. I have an image segmentation model to test, it always uncovers issues that folks working with imagenet models do not see.

@jerryzh168 Are you open for contribution? I’ve never contributed to such a large project as PyTorch, but I’m interested in the opportunity.

1 Like

Yes contributions are very welcome, especially expanding the coverage of the graph mode quantization to more models. Currently we are still trying to make the backbone support work, so a good point might be after https://github.com/pytorch/pytorch/pull/32816. After this PR we are already pretty close to eager mode result, there might be more PRs coming up but shouldn’t be too many.

1 Like

Hi @jerryzh168, I tried graph mode quantization test cases in test_quantization.py. In test_conv, I dumped the quantized IR like so:

model_quantized = quantize_script(
    model_under_test,
    qconfig_dict,
    default_eval_fn,
    [self.img_data],
    inplace=False)
self.assertEqual(model_quantized(self.img_data[0][0]), result_eager)

torch._C._jit_pass_inline(model_quantized.graph)
print(model_quantized.graph)

But the output is confusing. There are some quantization related operators inserted, but they are just doing quantize/dequantize round trip and the actual convolution is done in fp32.

graph(%self.1 : __torch__.torch.nn.modules.module.___torch_mangle_7.Module,
      %input : Float(2, 3, 10, 10)):
  %2 : __torch__.torch.nn.modules.module.___torch_mangle_8.Module = prim::GetAttr[name="conv"](%self.1)
  %4 : float = prim::GetAttr[name="input.1_scale"](%2)
  %5 : int = prim::GetAttr[name="input.1_zero_point"](%2)
  %6 : int = prim::GetAttr[name="input.1_scalar_type"](%2)
  %input.1.quant : Tensor = aten::quantize_per_tensor(%input, %4, %5, %6)
  %input.1.dequant.0 : Tensor = aten::dequantize(%input.1.quant)
  %9 : bool = prim::Constant[value=1](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
  %10 : bool = prim::Constant[value=0](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
  %11 : int = prim::Constant[value=1](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
  %12 : None = prim::Constant(), scope: __module.conv
  %13 : Tensor = prim::GetAttr[name="weight"](%2)
  %14 : float = prim::GetAttr[name="8_scale"](%2)
  %15 : int = prim::GetAttr[name="8_zero_point"](%2)
  %16 : int = prim::GetAttr[name="8_scalar_type"](%2)
  %8.quant : Tensor = aten::quantize_per_tensor(%13, %14, %15, %16)
  %18 : int[] = prim::Constant[value=[1, 1]]()
  %19 : int[] = prim::Constant[value=[0, 0]]()
  %20 : int[] = prim::Constant[value=[1, 1]]()
  %21 : int[] = prim::Constant[value=[0, 0]]()
  %22 : Tensor = quantized::conv2d_prepack(%8.quant, %12, %18, %19, %20, %11)
  %23 : Tensor, %24 : Tensor? = quantized::conv2d_unpack(%22)
  %25 : Tensor = aten::dequantize(%23)
  %26 : Tensor = aten::conv2d(%input.1.dequant.0, %25, %24, %18, %19, %20, %11)
  %27 : float = prim::GetAttr[name="15_scale"](%2)
  %28 : int = prim::GetAttr[name="15_zero_point"](%2)
  %29 : int = prim::GetAttr[name="15_scalar_type"](%2)
  %15.quant : Tensor = aten::quantize_per_tensor(%26, %27, %28, %29)
  %15.dequant.0 : Tensor = aten::dequantize(%15.quant)
  return (%15.dequant.0)

Is this expected? I was hoping to see quantized::conv2d op there.

I’m using torch built from source and the commit is at:

In [2]: torch.__version__                                                                                                                                                                     
Out[2]: '1.5.0a0+c75d06d'

Yes, graph mode quantization is still in active development, we should be getting something working very soon though, please stay tuned.

1 Like