Quantizaton of EfficientNet Models

Hi,
I performed the quantization technique on efficient net models by referring post-training static quantization method in PyTorch blogs. But I was only able to bring a reduction only by 5 MB.
Also, I wasn’t able to perform the layer fusion step on the prebuilt layers of this model while quantizing using the existing PyTorch techniques. How do I approach this problem? Or is there an alternative method to bring down the size of the model without affecting its accuracy much?
Can someone help me with this?

Thanks in advance!

It might be related to fusion, why can’t you do fusion?