Your process looks fine. I refer to this for back engineering the QuantizedLinear OP.
Only one point you may take care is that overflow can happen in
matmul_out = np.matmul(x_q, fc_weight.T)
Better to try the below again:
matmul_out = np.matmul(x_q.astype(np.int32), fc_weight.T.astype(np.int32))