NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'QuantizedCPU' backend

I am trying to quantize a model, but I got an error when I executes:

model_int = torch.quantization.convert(model_fp_prepared)
print("Conversion completed")
q_output = model_int(img, mask)

NotImplementedError: Could not run ‘aten::empty.memory_format’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::empty.memory_format’ is only available for these backends: [CPU, CUDA, Meta, MkldnnCPU, SparseCPU, SparseCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

It looks like it happened here:

output = self.input_conv(input_x * mask)

I guess I figured it out. This error seems to happen when I try to multiply those quantized tensors(input_x, mask). The workaround I took is:

# First, dequantize the quantized tensor
 input_x = self.dequant(input_x)
 mask = self.dequant(mask)
# Do the operation and quantize it back
 masked = input_x * mask
 masked = self.quant(masked)

 input_x = self.quant(input_x)
 mask = self.quant(mask)

 output = self.input_conv(masked)

Seems like pretty tedious work but it works. However, can I use self.quant() multiple times like that? or Should I use self.quant1(), self.quant2(), self.quant3() separately?

if you want to quantize multiplication, you’ll need to rewrite * to use functional modules: pytorch/ at master · pytorch/pytorch · GitHub, an example can be found here: pytorch/ at e61fc1c03b64e61ca4f5bbe278db7ee2cf35e8ff · pytorch/pytorch · GitHub

If you want to dequantize then quantize, it’s better to use different instances of quantize module.