Quantization-aware training for GPT2

Hi Jerry,

Thanks for the explanation.

  1. In your example, the input is quantized from fp32 to int8 by the QuantStub module, but how about the weights in the layer (linear, or conv for example)? It seems that we don’t need to quantize the weight from your example?

  2. How about the output from previous layers? For example, the output from the previous linear or activation layer. I understand after the calculation the result should be fp32, so do we need to put QuantStub in between two layers?