How to implement forward pass for a quantized linear?

Hi, I have a quantized model and I want to extract model’s parameters and implement a forward pass of a quantized linear layer manually. But I don’t know how quantized model do forward pass.
All I get is this code from pytorch source.

def forward(self, x: torch.Tensor) -> torch.Tensor:
    return torch.ops.quantized.linear(
        x, self._packed_params._packed_params, self.scale, self.zero_point)

But I cannot find where torch.ops.quantized.linear is defined.

Can someone give me a hint on how to implement the forward of a quantized linear layer?

Hi, this calls the low-level (C++) quantized linear kernel. You can find its implementation in pytorch/qlinear.cpp at master · pytorch/pytorch · GitHub

1 Like

hi,I also want to implement forward pass for a quantized linear, but in the model_state_dict only has weight parameters. From the source code, I think that we need parameters named output_scale and output_zero_point. How can I get this?

For a single layer, it’s simply layer.scale and layer.zero_point

1 Like