We are attempting to replicate the logic of Dynamic Quantization for LSTMs. While the observer scheme for the weights if clear from the default_qconfig for Dynamic Quantization, the scheme used for the activations isn’t clear to me. I have attempted to use the MinMaxObserver, MovingAverageMinMaxObserver, and the HistogramObserver but the results I obtain do not match the ones I get from using the DynamicQuantizedLSTM directly. I would appreciate any inputs on the scheme used for activations.
P.S. The overarching objective of attempting this exercise is to be able to implement Static Quantization and QAT for LSTMs which currently does not exist in PyTorch, and this would be a very welcome feature.