How to applying quantization model on CUDA?

Hi ! I’m a newbie for quantizationing.I’ve met a problem during using quantization like below error output:

'quantized::embedding_byte' is only available for these backends: [CPU, Meta,   
BackendSelect, Python, FuncTorchDynamicLayerBackMode,   
Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, 
AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, 
AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, 
FuncTorchBatched, BatchedNestedTensor, 
FuncTorchVmapMode, Batched, VmapMode, 
FuncTorchGradWrapper, PythonTLSSnapshot, 
FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

When i fall backing into CPU,it raised another error:

RuntimeError: quantized::linear_dynamic() Expected a value of type 'Tensor' for argument 'X' but instead found type 'method'.
Position: 0

Here is the quantization code i write for:

# Applying Dynamic Quantization to the model
for _, mod in model_fp32.named_modules():
    if isinstance(mod, torch.nn.Embedding):
        mod.qconfig = torch.ao.quantization.float_qparams_weight_only_qconfig
model_qint8 = torch.ao.quantization.quantize_dynamic(
    model_fp32, 
    {
        torch.nn.Embedding,
        torch.nn.Linear
    },
    dtype=torch.qint8
)

Is there any i messed up or something,please help me. :sob: