Any method to optimize graph for QAT model?

Hello!
I want to deploy a pytorch model trained in quantization-aware training method and then convert the model to other inference deep learning frame works(such as MACE) to deploy on mobile devices.
I am using a model based on https://github.com/pytorch/vision/blob/master/torchvision/models/mobilenet.py
my modification is:
1: change change bias=False to bias=True
2: change out_features of classifier to 20 because I have only 20 classes.
I use torch.jit to trace my model and save it.
The traced model is available in:

I find the graph created by jit is not optimized:

The consants are not folded so there are add, sqrt, div and mul before conv layer, the bias of conv layer is added twice, BN parameters are not folded into Conv layer.

Is there any suggestion to optimize a jit model to do these folds?
I want to deploy the model in mobile device so these folds can be useful.

Many thanks.

Did you quantize the model before transferring? See here https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html#quantization-aware-training for the steps.