I’m trying to perform QAT on GPT2 model, but I’m a bit confused about the documentation regarding the QuantStub.
Where should I place the QuantStub and DeQuantStub? Based on my understanding I should place the first QuantStub after the Embedding layer and the DequantStub after the Relu activation layer of the FFN; then subsequently the QuantStub will be after the previous DequantStub, which is before the second linear layer of the FFN of the previous decoder layer. Is that correct?
I can only fuse the first linear layer and the Relu activation layer of the FFN in each of the decoder layers. Am I right?
Thanks in advance!