Current status of automatic quantization support

masahi · January 16, 2020, 9:31pm

Hi, I remember from one of the talks at the last dev con or elsewhere that PyTorch v1.4 would ship with support for automatic quantization of jitted models. v1.4 is out now, but it is not clear from looking at the release note whether this feature is official.

Is automatic quantization ready for experimental use? Can it already quantize an entire network from torchvision for example?

Thanks
masa

raghuramank100 · January 17, 2020, 6:30pm

Yes, we are very close to having graph mode ready, there are still a few changes that havent been completed yet. (cc @jerryzh168 for updates)

jerryzh168 · January 17, 2020, 10:03pm

Yes, graph mode quantization will be ready for testing after https://github.com/pytorch/pytorch/pull/32303 lands.
Next we are going to test more models and fix the issues we see in the process, this might take a few more weeks to a month or so, but in the meantime, feel free to checkout the master and test the graph mode quantization on your model as well.
I’ll reply here when this is ready for trying out.

masahi · January 17, 2020, 10:16pm

Thanks for the replies. I’m already using the master build so I’ll try it out. I have an image segmentation model to test, it always uncovers issues that folks working with imagenet models do not see.

@jerryzh168 Are you open for contribution? I’ve never contributed to such a large project as PyTorch, but I’m interested in the opportunity.

jerryzh168 · January 31, 2020, 10:59pm

Yes contributions are very welcome, especially expanding the coverage of the graph mode quantization to more models. Currently we are still trying to make the backbone support work, so a good point might be after https://github.com/pytorch/pytorch/pull/32816. After this PR we are already pretty close to eager mode result, there might be more PRs coming up but shouldn’t be too many.

masahi · February 17, 2020, 8:30am

Hi @jerryzh168, I tried graph mode quantization test cases in test_quantization.py. In test_conv, I dumped the quantized IR like so:

model_quantized = quantize_script(
    model_under_test,
    qconfig_dict,
    default_eval_fn,
    [self.img_data],
    inplace=False)
self.assertEqual(model_quantized(self.img_data[0][0]), result_eager)

torch._C._jit_pass_inline(model_quantized.graph)
print(model_quantized.graph)

But the output is confusing. There are some quantization related operators inserted, but they are just doing quantize/dequantize round trip and the actual convolution is done in fp32.

graph(%self.1 : __torch__.torch.nn.modules.module.___torch_mangle_7.Module,
      %input : Float(2, 3, 10, 10)):
  %2 : __torch__.torch.nn.modules.module.___torch_mangle_8.Module = prim::GetAttr[name="conv"](%self.1)
  %4 : float = prim::GetAttr[name="input.1_scale"](%2)
  %5 : int = prim::GetAttr[name="input.1_zero_point"](%2)
  %6 : int = prim::GetAttr[name="input.1_scalar_type"](%2)
  %input.1.quant : Tensor = aten::quantize_per_tensor(%input, %4, %5, %6)
  %input.1.dequant.0 : Tensor = aten::dequantize(%input.1.quant)
  %9 : bool = prim::Constant[value=1](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
  %10 : bool = prim::Constant[value=0](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
  %11 : int = prim::Constant[value=1](), scope: __module.conv # /home/masa/work/pytorch/pytorch/torch/nn/modules/conv.py:345:0
  %12 : None = prim::Constant(), scope: __module.conv
  %13 : Tensor = prim::GetAttr[name="weight"](%2)
  %14 : float = prim::GetAttr[name="8_scale"](%2)
  %15 : int = prim::GetAttr[name="8_zero_point"](%2)
  %16 : int = prim::GetAttr[name="8_scalar_type"](%2)
  %8.quant : Tensor = aten::quantize_per_tensor(%13, %14, %15, %16)
  %18 : int[] = prim::Constant[value=[1, 1]]()
  %19 : int[] = prim::Constant[value=[0, 0]]()
  %20 : int[] = prim::Constant[value=[1, 1]]()
  %21 : int[] = prim::Constant[value=[0, 0]]()
  %22 : Tensor = quantized::conv2d_prepack(%8.quant, %12, %18, %19, %20, %11)
  %23 : Tensor, %24 : Tensor? = quantized::conv2d_unpack(%22)
  %25 : Tensor = aten::dequantize(%23)
  %26 : Tensor = aten::conv2d(%input.1.dequant.0, %25, %24, %18, %19, %20, %11)
  %27 : float = prim::GetAttr[name="15_scale"](%2)
  %28 : int = prim::GetAttr[name="15_zero_point"](%2)
  %29 : int = prim::GetAttr[name="15_scalar_type"](%2)
  %15.quant : Tensor = aten::quantize_per_tensor(%26, %27, %28, %29)
  %15.dequant.0 : Tensor = aten::dequantize(%15.quant)
  return (%15.dequant.0)

Is this expected? I was hoping to see quantized::conv2d op there.

I’m using torch built from source and the commit is at:

In [2]: torch.__version__                                                                                                                                                                     
Out[2]: '1.5.0a0+c75d06d'

jerryzh168 · February 21, 2020, 5:23am

Yes, graph mode quantization is still in active development, we should be getting something working very soon though, please stay tuned.