Custom quantized Linear or Conv2d layer

Dear Users,

I would like to ask for pointers for how to extend nn.Linear and nn.Conv2d for post-training static quantization or quantization-aware training without rewriting a lot of stuff, such that it can still be used with operator fusion etc… An example change could be to apply an affine transformation to the weights prior to calling the linear operation. Could someone help please? Thanks!

the change in eager mode quantization will require inheriting https://github.com/pytorch/pytorch/blob/master/torch/nn/quantized/modules/linear.py#L103 and also related fusion modules under https://github.com/pytorch/pytorch/tree/master/torch/nn/intrinsic folder, and pass a white_list(https://github.com/pytorch/pytorch/blob/master/torch/quantization/quantize.py#L178) extended with the new module. It will require familiarity of the whole eager mode quantization flow.

Thanks Jerry, this is what I initially thought, but I wanted to double-check if my assumption was right. Thank you!

1 Like