Help please!
I am currently working on a project where I need to deploy a PyTorch-based neural network model onto an FPGA for inference. My goal is to quantize the model so that both the activations and weights are within the range of -128 to 127 (8-bit precision)
So far, I have experimented with both fbgemm
and qnnpack
for quantization:
- Activation Quantization: I encountered an issue where I was unable to convert the activations to integer values in the range of -128 to 127. When I doing the inference on my computer, it say the
conv
layers do not supportqint8
as input. - QNNPACK Compatibility: I also tried using QNNPACK for inference, but faced a limitation where QNNPACK does not support the
conv3d_unpack
operation.
Does PyTorch itself offer any features that could support the qint8 activations?
Or would I need to explore other frameworks or toolchains?