FX mode static_quantization for YOLOv7

RonaldYuren · March 10, 2024, 2:22pm

I use the FX_mode post-training-static quantization on YOLOv7，
but I always see the error about the control-flow (which is happend at yolov7/models/yolo.py , that function is fuseforward()， and I don’t know how to solve it)

you can see the code below:

github.com

WongKinYiu/yolov7/blob/main/models/yolo.py

import argparse
import logging
import sys
from copy import deepcopy

sys.path.append('./')  # to run '$ python *.py' files in subdirectories
logger = logging.getLogger(__name__)
import torch
from models.common import *
from models.experimental import *
from utils.autoanchor import check_anchor_order
from utils.general import make_divisible, check_file, set_logging
from utils.torch_utils import time_synchronized, fuse_conv_and_bn, model_info, scale_img, initialize_weights, \
    select_device, copy_attr
from utils.loss import SigmoidBin

try:
    import thop  # for FLOPS computation
except ImportError:
    thop = None

This file has been truncated. show original

My pytorch version is 2.1.0, and I use the code in pytorch 2.1.0 document. to do quantization
Please help me to solve it, Thank you!

HDCharles · March 18, 2024, 9:26pm

that’s the main issue with fx quantization, you’re model needs to be symbolically traceable. You’ll need to get rid of the control flow or try some other quantization method.

This error is documented in the quantization docs here: Quantization — PyTorch 2.2 documentation with links to additional resources for making models symbolically traceable.

RonaldYuren · March 28, 2024, 8:09am

I have corrected this error by modified the model’s code.
But I have encountered the “precision loss” after I using the Post-traing FX-mode static quantization during my inference job.
May I ask you “how to find the problem” or “where to modified” that I can continue my quantization job?

jerryzh168 · March 28, 2024, 7:34pm

here is a guide for numerical debugging: Quantization Accuracy Debugging — PyTorch 2.2 documentation, but we may have updated tools later

RonaldYuren · March 29, 2024, 7:56am

Does it work when I am using pytorch 2.1.0?

jerryzh168 · March 29, 2024, 1:24pm

yeah it should work on 2.1

RonaldYuren · April 2, 2024, 5:27am

Thx a lot!!
I am not sure about the fourth advice that is " consider QAT when trying PTQ", that means I need to use QAT(on my training step ) and then implement PTQ? or after I implement PTQ, then I use QAT just like finetuning?

RonaldYuren · April 10, 2024, 11:52am

Hi, I am trying using my “static_quantize_weights.pt” to do QAT(or you can call finetuning)；
But it come across the problem that quantized-weight doesn’t have the same key compare with the original model.

the ERROR message is like below:

Traceback (most recent call last):
File “Static_quantize5.py”, line 200, in
train(hyp, opt = opt,device = torch.device(‘cpu’), tb_writer = tb_writer)
File “/home/gene/YuRen/yolov7/train.py”, line 102, in train
state_dict = ckpt[‘model’].float().state_dict() # to FP32
KeyError: ‘model’

How could I do next?

jerryzh168 · April 24, 2024, 5:21pm

it looks like it’s just ckpt does not have model as a key, so maybe you can checkout what does it contain and see what to do next

RonaldYuren · April 25, 2024, 8:01am

Hi ,
I am using the code below to do quantization:
###############
def saveModel(model):
torch.save(model.state_dict() , ‘static_quantized_weights.pt’)

def static_quantize(weight_path, qconfig_mapping):
weights, imgsz, cali_dataset = opt.weights, opt.img_size, opt.datasets

#   Load Model
# 
device = torch.device("cpu")
# Load model  

ckpt = torch.load(weights[0])

model = attempt_load(weights, map_location=device)

stride = int(model.stride.max())  # model stride
imgsz = check_img_size(imgsz, s=stride)  # check img_size

dataset_processing = LoadImages(cali_dataset, img_size=imgsz, stride=stride)

model_to_quantize = copy.deepcopy(model)

example_input = torch.randn(1, 3, 1024, 1024)
qconfig_mapping = get_default_qconfig_mapping("fbgemm")  # better for intel CPU
model_to_quantize.eval()

# prepare
model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_mapping, example_input)
calibration(model_prepared, dataset_processing)
# quantize
model_quantized = quantize_fx.convert_fx(model_prepared)
saveModel(model_quantized)

loaded_model = torch.load('static_quantized_weights.pt', map_location='cpu')
print("Static Quantization Success!!!!")

##########################

and I using the code below to load quantized-model:
###################
ckpt = torch.load(weights, map_location=device) # load checkpoint
model_ckpt = attempt_load(weights, map_location=device) # you will get the weight from
ckpt[‘model’]
exclude = [‘anchor’] if (opt.cfg or hyp.get(‘anchors’)) and not opt.resume else # exclude keys

        quantized_weights = torch.load(opt.quantization_weights, map_location='cpu')
        example_input = torch.randn(1, 3, 1024, 1024)
        qconfig_mapping = get_default_qconfig_mapping("fbgemm")
        model_ckpt.to(device)
        
        anchor_grid_modules2 = find_anchor_grid_modules(model_ckpt)
        grid_file_path2 = "Original_anchor_grid_modules.txt"
        write_anchor_grid_modules_to_file(anchor_grid_modules2, grid_file_path2)
        
        with open('parameter_groups_Originalmodel.txt', 'w') as f:
            for name, param in model_ckpt.named_parameters():   # OK  have value
                f.write(f'{name}\n')
        with open('module_names.txt', 'w') as f:
            # Iterate over named modules
            for name, module in model_ckpt.named_modules():
                # Write module name to file
                f.write(f'{module}\n')
        # prepare
        model_prepared = quantize_fx.prepare_fx(model_ckpt, qconfig_mapping, example_input)
        
        # calibrate (not shown)
        
        # quantize
        model = quantize_fx.convert_fx(model_prepared)
        
        model.load_state_dict(quantized_weights, strict=False)
        model.to(device) 
        with open('model_state_dict_keys.txt', 'w') as f:
            for key in model.state_dict().keys():
                f.write(f'{key}\n')
        with open('quantized_model_state_dict.txt', 'w') as f:   # OK have value
            count = 0
            for key, value in model.state_dict().items():
                f.write(f'{key}: {value}\n')
                count += 1
                if count >= 30:
                    break
        with open('parameter_groups_Qmodel.txt', 'w') as f:    # no value
            for idx, param in enumerate(model.parameters()):
                f.write(f'Parameter {idx}: {param.shape}\n')
        with open('Quan_module_names.txt', 'w') as f:    # OK have value
            # Iterate over named modules
            for name, module in model.named_modules():
                # Write module name to file
                f.write(f'{name}\n')

###############
Just like above, the model_ckpt has value of model_ckpt…named_parameters() & .named_modules() & .state_dict() ；
But the model (which is load the quantized-weights) doesn’t have value for model.named_parameters()

I am not sure what happend to my loading quantized-model
Do you have any idea, please?

HDCharles · April 25, 2024, 8:21pm

have you looked at Quantization — PyTorch 2.3 documentation?

the other thing you could try is to do something along the lines:

model_prep = preapre_fx(model)
calibration(model_prep)
save_state_dict(model_prep.get_state_dict(), "prep_state_dict.pt")
....etc....
model = get_new_model()
model_prep = prepare_fx(model)
prep_state_dict = torch.load("prep_state_dict.pt")
model_prep.load_state_dict(prep_state_dict)
model_quantized = convert_fx(model_prep)

RonaldYuren · April 26, 2024, 6:55am

Oh… I have not read the 2.3 documentation because My computer is only suitable for pytorch2.1
I will give a try today, Thx!!!

RonaldYuren · April 26, 2024, 7:03am

So… I think I should Load the prepare-model_stateDic.pt for my original model then go convert_fx(),
And then… I also need to implemtent the code “model.load_state_dict(quantized_weights, strict=False)” to load my quantized-weights ?

HDCharles · April 26, 2024, 6:25pm

you would do one or the other.

if you want to load the prepared state_dict, you would do the steps up to calibration, then after calibration you save the state_dict. to load, you would create a fresh model, apply the same prepare_fx setup, then load the prepared state_dict, then convert the model to get the final quantized model. In this case what you’d be saving/loading is the calibration statistics we collected so convert_fx can be applied correctly.

if you want to load the converted state dict, you would do the steps up to convert_fx and then save that state_dict. to load, you would create a fresh model, prepare and convert it, then load the state_dict. In this case you’d be saving/loading the actual quantized weights. However there are sometimes some weird serialization bugs.

either should work.

RonaldYuren · April 27, 2024, 12:07pm

About the first way " save model’s state_dict when I finish the prepare & Calibration , then implement the convert_fx when I am doing retraining ",
but if I using the retraining model’s weights to do inference job , maybe some of the parameters(or model’s dictionary keys ) will disppear just like my problem before ?

I think I need to save the retrained quantized-model’s weights by saving state_dict(), or I have to use the torch.jit.save/load instead?
I am not sure if it(torch.jit) will work for my pytorch 2.1

RonaldYuren · April 27, 2024, 1:09pm

I think the " torch.jit " model type is not compatible to my code.