Assuming that a custom module is added to an original model a structure: `Linear → GELU → Linear’ using FX graph manipulation the GELU which doesn’t get quantized via FX mode is replaced by a custom module that works on int arithmetics to perform some approximations. I would like to know how to expose the quantized tensor as input to this custom module.

I tried setting the following approaches for the custom module

- Setting the backend_config for qint8 input and o/p
- Setting the qconfig for qint8 for activation and weight
- Manual Insertion of Stubs in the custom module

However, the problem seems to be clear that these tensor attributes were indeed getting run on the dequantized tensor o/p of the previous linear layer as evident from the converted model.

GraphModule(

(linear1): QuantizedLinear(in_features=256, out_features=1024, scale=1.0, zero_point=0, qscheme=torch.per_channel_affine)

(gelu): Module()

(linear2): QuantizedLinear(in_features=1024, out_features=256, scale=1.0, zero_point=0, qscheme=torch.per_channel_affine)

)

def forward(self, x):

linear1_input_scale_0 = self.linear1_input_scale_0

linear1_input_zero_point_0 = self.linear1_input_zero_point_0

quantize_per_tensor = torch.quantize_per_tensor(x, linear1_input_scale_0, linear1_input_zero_point_0, torch.quint8); x = linear1_input_scale_0 = linear1_input_zero_point_0 = None

linear1 = self.linear1(quantize_per_tensor); quantize_per_tensor = None

dequantize_1 = linear1.dequantize(); linear1 = None

gelu__c = self.gelu._c

forward = gelu__c.forward(dequantize_1); gelu__c = dequantize_1 = None

gelu_scale_0 = self.gelu_scale_0

gelu_zero_point_0 = self.gelu_zero_point_0

quantize_per_tensor_2 = torch.quantize_per_tensor(forward, gelu_scale_0, gelu_zero_point_0, torch.quint8); forward = gelu_scale_0 = gelu_zero_point_0 = None

linear2 = self.linear2(quantize_per_tensor_2); quantize_per_tensor_2 = None

dequantize_3 = linear2.dequantize(); linear2 = None

return dequantize_3

This indeed then results in the issue when performing inference of the converted model: “Could not run ‘aten::q_scale’'or aten::int_repr with arguments from the CPU backend”. Is there a way to perform some integer arithemetics on the quantized tensor such that FX quantization is compatible with this example custom module say:

class customGELU(nn.Module):

def **init**(self):

super(customGELU, self).**init**()

def forward(self, x):

q_val = x.int_repr()

q_appr = q_val >> 2

q_appr_float = (q_appr.float() - x.q_zero_point()) * x.q_scale()

return torch.quantize_per_tensor(q_appr_float, x.q_scale(), x.q_zero_point(), x.dtype)