Yes it truly work only for Linear.
(conv_head): Conv2d(384, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU6(inplace=True)
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=True)
(classifier): DynamicQuantizedLinear(in_features=1280, out_features=5, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
But my goal is to measure evaluation time of quantized model and compare this time with float32 model. For that i try to static quantization:
model_sigmoid.qconfig = torch.quantization.get_default_qconfig('qnnpack')
# insert observers
torch.quantization.prepare(model_sigmoid, inplace=True)
# Calibrate the model and collect statistics
# convert to quantized version
torch.quantization.convert(model_sigmoid, inplace=True)
This code quantize all the layer. But i cant run this quantized model,because of that:
start_time = time.time()
with torch.no_grad():
# with torch.autograd.set_detect_anomaly(True):
pred = model_sigmoid(torch_img)
print('Time = ', time.time() - start_time)
RuntimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].
I understand that this backend not support cpu and cuda, so question is it possible to run this static quantized model on windows 10 (x64)? And it will be cool if you compare each backed in RuntimeError with device on which it can evaluete.