Could not run 'aten::q_scale' with arguments from the 'CUDA' backend

Could not run ‘aten::q_scale’ with arguments from the ‘CUDA’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::q_scale’ is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

I get the same error when I try for the CPU .
I even try to detach and reassign to cuda after the operation and it still fails.

What I am doing seems to be a call to the C function which doesnt seem to be binded with python.

can we have an alternative way to extract tensor or a fix to this ?

this is because you are calling q_scale on a non-quantized Tensor I think, can you show us the code?

# custom observed module, provided by user
class CustomObserverModule(torch.nn.Module):
    ...
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        input_scale = x.q_scale()
        input_scale = x.q_zero_point()
        input = x.int_repr()
        
    ...

are you passing in a non-quantized Tensor to the forward function?

1 Like

As an input to the whole model I am passing the Input as Regular Tensor. Although, The Quant Stub Should be able to quantize the inputs and pass them right ?

I am adding the QuantWrapper to my model and then pass it to Prepare stage .

Somehow I am seeing that the input to the function is still a tensor.

quant stub gets replaced by a quantize op during convert, observers take in non-quantized tensors and analyze them, by calling x.q_scale() on the input in your observer, you are applying x.q_scale() on a normal tensor which is causing the error.

Agreed !

What I was doing was is creating a custom observer module and I was looking for to see If I could pass in a QTensor.

Current Model in the Observer passes the regular tensor while the weights and output are quantized and dequantized.

It is only in the inference stage (after convert) that we see input to be the Qtensor.

This fails to take into account the combined scale (I_scale * W_Scale / O_Scale) 32 bit approximation while Training in QAT .

if there is a way to pass QTensor in the Custom Observers during Training in QAT that would be greatly appreciated.