Could not run 'quantized::cat' with arguments from the 'QuantizedCUDA' backend

Hi, I have replaced torch.cat with self.ff.cat as I encountered an error with torch.cat

After replacing, I got another error which is:

NotImplementedError: Could not run ‘quantized::cat’ with arguments from the ‘QuantizedCUDA’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘quantized::cat’ is only available for these backends: [QuantizedCPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, Tracer, Autocast, Batched, VmapMode].

and so, i tried adding x.to(“cpu”) since QuantizedCPU is available but i got YET another error pointing to x.to(“cpu”):

NotImplementedError: Could not run ‘aten::empty_strided’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::empty_strided’ is only available for these backends: [CPU, CUDA, Meta, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

we do not have any operators for QuantizedCUDA backend currently, although the other path (x.to(“cou”)) should work, which pytorch version are you using? or are you using master. cc @HDCharles could you take a look?

sorry it points to x.to(‘cpu’) as i did not reassign the tensor. x = x.to(‘cpu’) helped solve the aten::empty_strided issue

1 Like

in this case, would it be better to
dequant → torch.cat with CUDA → quant
or
forward pass with ff.cat QuantizedCPU?

I feel probably run everything on CPU would be better, running one op in CUDA sounds a bit weird…

not sure about your goal though, if you find running torch.cat on CUDA is faster and you need to squeeze every bit of perf then maybe that is OK as well