How to use quantized weights for manual implementation of the model in FPGA?

Hi! I’m starting to study about the implementation of quantized models in FPGA. Although I’m still learning, I would like to know if I can use PyTorch quantization for this.

I mean, I’d like to train a simple CNN model using PyTorch, quantize it to integers, and save the quantized weights and biases to a file, so I can load it later into the same CNN implemented manually on the FPGA.

So I believe I will need to implement Post-Training Static Quantization in the trained model, as shown on this page, but I’m not quite sure what to do with the weights and biases after the PyTorch quantization process. That way, when I try to check the weights of the layers after quantization, they still appear to be float, but now they also have scale and zero_point values. How can I use this information to effectively have the weights and biases as integer values for future manual implementation of the model in FPGA?

Also, I would really appreciate it if anyone has any tips or suggestions for this kind of hardware implementation.

Thank you very much.

@lucasmazz, here is a discussion on how you can perform post-static quantization on pre-trained models. https://discuss.pytorch.org/t/i-want-to-quantize-my-trained-model-model-pth-model-size-188mb/151065/2?u=sairam954

PyTorch also provides int8 pre-trained models, available models information here : [Models and pre-trained weights — Torchvision 0.14 documentation](https://Pre-Trained Models)

You can access the int8 values of quantized tensors by using the int_repr() method.

I don’t have an answer for how you would implement it on FPGA. I am working on similar requirement, if it works, I will share more about it.

Thanks