Following the official tutorial, I used this code for compilation:
from denoiser.pretrained import dns48
from denoiser.demucs import DemucsStreamer
from torch.utils.mobile_optimizer import optimize_for_mobile
model = dns48()
streamer = DemucsStreamer(model)
streamer.qconfig = torch.quantization.get_default_qconfig("qnnpack")
streamer = torch.quantization.convert(streamer, inplace=True)
torchscript_model = torch.jit.script(streamer)
optimized_model = optimize_for_mobile(torchscript_model)
And of course I had to slightly modify the denoiser code (add some type hints, etc.) to make it compile. Without quantization, it runs but it is way too slow. After adding the quantization step I got this error:
com.facebook.jni.CppException: Could not run 'quantized::conv1d' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv1d' is only available for these backends: [QuantizedCPU, BackendSelect, Functionalize, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy].
I am not sure if my backend falls into QuantizedCPU, or if it is really not supported. Is there anything I can do?
So what this is sasying is that somehow the quantized conv1d is getting FP32 tensor as input instead of quantized one. You might wanna take a look at the quantized torchscript graph (do m = torch.jit.load(...), pring(m.graph)) and see what is the input to conv1d