I wrote a UNet-based model that inpaints the corrupted pixels in an input image. It is written in Python using PyTorch frameworks. It is relatively huge network, so the inference time is 200ms/image on CPU and 80ms/image on GPU. Now I want to deploy this model on Intel FPGA in the embedded products run by ARM core. The reason to do this is:
- To improve this inference time
- To save computing power at the end user
I am still investigating how to get this done. I want to avoid re-writing the model from the ground using HDL on FPGA. Has anyone done things like this before?