How to import a quantized TFLite model into PyTorch


I’m fairly new to PyTorch and I’d like to understand how to import a quantized TFLite model into PyTorch so I can work on it in PyTorch.

I already have a PyTorch model definition which matches the model used to create the .tfilte file – except for the fact that this tflite file has been quantized, presumably automatically at export time.

There are two aspects of this I want to understand better.

First, the conv2d kernels and biases in the TFLite file are float16. Of course, I load these tensors’ buffers into float16 numpy arrays when I am reading from the tflite file. But is it enough to use these float16 numpy arrays as the values when I am populating the state_dict for my PyTorch model? Or do I need to define the torch model differently, for instance when I initialize the nn.Conv2d modules?

Second, I notice that this TFLite model has many (about 75) automatically generated “dequantize” layers after the normal-seeming part of the model. Do I need to manually add layers to my PyTorch model to match all these TFLite dequantization layers?

I’d appreciate any advice, especially pointers to any examples of how to do this.

You can get the TF model weights, and load the into the non-quantized PyTorch model. After that you can just quantize the model using the PTQ flow described in here: