Thank you, Jerry, this helps.
To clarify
In eager mode quantization
– that is the only mode available at the moment, correct? The JIT does not quantize (yet?).
This might sounds like a nit, but I think the following actually reflects a fundamental difficulty in quantizing in eager mode:
so the quantization of the input Tensor should be taken care of by the previous module,
How does pytorch decide what is “previous”? The true sequence of layer invocations is determined by the procedural code in forward()
and it can involve branching at runtime or just data flow merging like with skip connections.
(I am yet to succeed in quantizing resnet because of this, I suspect)