if my model consists of deeper and nested layers, should I insert quant&dequant into every layer?
If all the operators in the model can be quantized, you can insert a
dequant at the beginning and end of the model. See (beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 1.10.0+cu102 documentation for more details.
If there are only specific layers that can be quantized then you would have to wrap them individually using quant-dequant blocks in eager mode. Alternatively, you could try the FX graph mode quantization flow which should automate the process - (prototype) FX Graph Mode Post Training Static Quantization — PyTorch Tutorials 1.10.0+cu102 documentation