Weight Quantization

I’m trying to implement the fixed point version of VGG 16. I want to start with the pre-trained VGG 16 with the floating point weight precision, then I wand to add a quantization layer before each convolutional layer which quantized the floating point weights into fixed point format (e.g., 8bits ) before multiplied by feature map in conolutional layers. My quantization function is :

wq = clip(round(w/stp), a,b)

where w,wq,stp, a and b are floating point weight, quantized weight, step size , min value and max value, respectively.

Then I want to fine tune my model with the quantized wieght.
So far, I have defined a new layer as quantization layer which accept the floating point weight as the input and returns the quantized value of weight. here is my questions:

  1. Is the the best way to impeliment fixed point network?
  2. Do I need to define a new backward method to be used during the traning(fine tuning) process ?
  3. How can I feed the weight of each convolution layer to the quantization layer ? or How can I use this layer in my model architecture?
class Linear_Quantization(nn.Module):
  def __init__(self):
    super().__init__()
    self.bit_width=8
    self.step_size = Parameter(torch.Tensor(1))
  
  def forward(self, x):
    x = torch.round(x/self.step_size) 
    x = torch.clamp(x, min = -2**(self.bit_width-1) , max = (2**(self.bit_width-1)-1))
    x = x*self.step_size
    return x