Wrong gradients in quantization-aware training

Hi, I use _fake_quantize_learnable_per_tensor_affine api as a component of my quantization layer. I found in some cases, this api will make the gradients of scale, zero_point and input wrong. The gradients generated by backward is different from the ones calculated by hand, even generated nan error. After I replaced _fake_quantize_learnable_per_tensor_affine with my own written code, the nan disappeared. My pytorch version 1.8.2.

Hi, is there any help?

Hi @sherylwang , do you have an example that could reproduce the behavior? Typically a small representative model would help us understand and debug the problem better.