Dynamic quantization

Abdulaziz_Gaybulayev · August 13, 2021, 2:35am

I have tried post training dynamic quantization with YOLOv5 model. Model file is available here (https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.pt)

When I tried to quantize, model size is increasing twice and inferencing time is same with FP32 model.

Based on pytorch tutorial (Dynamic Quantization — PyTorch Tutorials 1.9.0+cu102 documentation).

model = attempt_load(weights, map_location=device)
quantized_model = torch.quantization.quantize_dynamic(model, {torch.nn.Conv2d, torch.nn.Linear}, dtype=torch.qint8)  #int8 
# model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)  #int8

I am not sure {torch.nn.Conv2d, torch.nn.Linear} ( list of submodule names in model to apply quantization to) is correctly using or not when applying it to YOLO model

I even checked the dtype using
for param in quantized_model.parameters(): print(param.dtype)

It is still fp32

Thank you for advances beforehand

jerryzh168 · August 14, 2021, 3:54am

can you print the model after quantize_dynamic? also I think we do not support dynamic quantization for torch.nn.Conv2d right now.