I really need to be able to do quantization aware training on GRU layers and PyTorch doesn’t support it yet. However, it seems to support static quantization for LSTM layers through custom modules. So, I used what was done in torch/ao/nn/quantizable/modules/rnn.py to make a quantizable version of the GRU layers.
-
Am I on the right track by following this? From what I read, the only missing part at that point is the
from_observed( )
method that can be found in torch/ao/nn/quantized/modules/rnn.py -
What’s the use of calling
torch.ao.quantization.prepare( )
in thefrom_float( )
method andtorch.ao.quantization.convert( )
in thefrom_observed( )
method? Isn’t that done when callingprepare_qat( )
orconvert( )
in the main program for example? -
In the quantization doc, it says “Currently, there is a requirement that ObservedCustomModule will have a single Tensor output, and an observer will be added by the framework (not by the user) on that output. The observer will be stored under the activation_post_process key as an attribute of the custom module instance. Relaxing these restrictions may be done at a future time”. Does it means that the forward method can’t output the output features AND the final hidden state?
Thanks a lot and have a nice day!