Hi @hafezmg48 , this is referencing old cpu-only quantization code. Are you trying to build on CPUs? Our new GPU-friendly APIs are over at GitHub - pytorch/ao: PyTorch native quantization and sparsity for training and inference .
In terms of why you see the error - it’s not clear from the snippet, but to debug this I would look at the source code of quantizated functional linear (Blaming pytorch/torch/nn/quantized/functional.py at 3e1fc85b23f9f12ff2ba5be645841bde90dba14e · pytorch/pytorch · GitHub ) and see at which line your code stops giving sensical results.