I’m trying to implement the fixed point version of VGG 16. I want to start with the pre-trained VGG 16 with the floating point weight precision, then I wand to add a quantization layer before each convolutional layer which quantized the floating point weights into fixed point format (e.g., 8bits ) before multiplied by feature map in conolutional layers. My quantization function is :
wq = clip(round(w/stp), a,b)
where w,wq,stp, a and b are floating point weight, quantized weight, step size , min value and max value, respectively.
Then I want to fine tune my model with the quantized wieght.
So far, I have defined a new layer as quantization layer which accept the floating point weight as the input and returns the quantized value of weight. here is my questions:
- Is the the best way to impeliment fixed point network?
- Do I need to define a new backward method to be used during the traning(fine tuning) process ?
- How can I feed the weight of each convolution layer to the quantization layer ? or How can I use this layer in my model architecture?
class Linear_Quantization(nn.Module): def __init__(self): super().__init__() self.bit_width=8 self.step_size = Parameter(torch.Tensor(1)) def forward(self, x): x = torch.round(x/self.step_size) x = torch.clamp(x, min = -2**(self.bit_width-1) , max = (2**(self.bit_width-1)-1)) x = x*self.step_size return x