Implementing Quantized Linear Layer in Numpy

Your process looks fine. I refer to this for back engineering the QuantizedLinear OP.
Only one point you may take care is that overflow can happen in

matmul_out = np.matmul(x_q, fc_weight.T)

Better to try the below again:

matmul_out = np.matmul(x_q.astype(np.int32), fc_weight.T.astype(np.int32))