I have tried post training dynamic quantization with YOLOv5 model. Model file is available here (https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.pt)
When I tried to quantize, model size is increasing twice and inferencing time is same with FP32 model.
Based on pytorch tutorial (Dynamic Quantization — PyTorch Tutorials 1.9.0+cu102 documentation).
model = attempt_load(weights, map_location=device)
quantized_model = torch.quantization.quantize_dynamic(model, {torch.nn.Conv2d, torch.nn.Linear}, dtype=torch.qint8) #int8
# model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8) #int8
I am not sure {torch.nn.Conv2d, torch.nn.Linear}
( list of submodule names in model to apply quantization to) is correctly using or not when applying it to YOLO model
I even checked the dtype using
for param in quantized_model.parameters(): print(param.dtype)
It is still fp32
Thank you for advances beforehand
Thank you for advances beforehand