I’m trying to implement the fixed point version of VGG 16. I want to start with the pre-trained VGG 16 with the floating point weight precision, then I wand to add a quantization layer before each convolutional layer which quantized the floating point weights into fixed point format (e.g., 8bits ) before multiplied by feature map in conolutional layers. My quantization function is :

wq = clip(round(w/stp), a,b)

where w,wq,stp, a and b are floating point weight, quantized weight, step size , min value and max value, respectively.

Then I want to fine tune my model with the quantized wieght.

So far, I have defined a new layer as quantization layer which accept the floating point weight as the input and returns the quantized value of weight. here is my questions:

- Is the the best way to impeliment fixed point network?
- Do I need to define a new backward method to be used during the traning(fine tuning) process ?
- How can I feed the weight of each convolution layer to the quantization layer ? or How can I use this layer in my model architecture?

```
class Linear_Quantization(nn.Module):
def __init__(self):
super().__init__()
self.bit_width=8
self.step_size = Parameter(torch.Tensor(1))
def forward(self, x):
x = torch.round(x/self.step_size)
x = torch.clamp(x, min = -2**(self.bit_width-1) , max = (2**(self.bit_width-1)-1))
x = x*self.step_size
return x
```