NotImplementedError: Could not run ‘aten::empty.memory_format’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::empty.memory_format’ is only available for these backends: [CPU, CUDA, Meta, MkldnnCPU, SparseCPU, SparseCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
I guess I figured it out. This error seems to happen when I try to multiply those quantized tensors(input_x, mask). The workaround I took is:
# First, dequantize the quantized tensor
input_x = self.dequant(input_x)
mask = self.dequant(mask)
# Do the operation and quantize it back
masked = input_x * mask
masked = self.quant(masked)
input_x = self.quant(input_x)
mask = self.quant(mask)
output = self.input_conv(masked)
Seems like pretty tedious work but it works. However, can I use self.quant() multiple times like that? or Should I use self.quant1(), self.quant2(), self.quant3() separately?