The pytorch quantization converts the INT32 MAC value into 8b in the backend. How can we access this compute layer INT32 value prior to conversion.
unfortunately int32 value for fbgemm/qnnpack is not accessible from outside. Why do you need this?
I need to impose some limits on the Accumulated values, how would that be feasible in the quantized model format?
can you describe the whole flow? Are you trying to impose the limit on the kernel level or when people train the model?
If you need a kernel that imposes some limits on the int32 value, then I think the best thing to do is to reimplement the kernel (possibly by calling fbgemm implementations if you need high performance: GitHub - pytorch/FBGEMM: FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/)
actually you might be able to modify the operator implementation a little bit and implement your own version of ops like quantized::conv: pytorch/qconv.cpp at master · pytorch/pytorch · GitHub