How to implement forward pass for a quantized linear?

Hi, I have a quantized model and I want to extract model’s parameters and implement a forward pass of a quantized linear layer manually. But I don’t know how quantized model do forward pass.
All I get is this code from pytorch source.

def forward(self, x: torch.Tensor) -> torch.Tensor:
    return torch.ops.quantized.linear(
        x, self._packed_params._packed_params, self.scale, self.zero_point)

But I cannot find where torch.ops.quantized.linear is defined.

Can someone give me a hint on how to implement the forward of a quantized linear layer?

Hi, this calls the low-level (C++) quantized linear kernel. You can find its implementation in pytorch/qlinear.cpp at master · pytorch/pytorch · GitHub

1 Like