Weight Quantization

I’m trying to implement the fixed point version of VGG 16. I want to start with the pre-trained VGG 16 with the floating point weight precision, then I wand to add a quantization layer before each convolutional layer which quantized the floating point weights into fixed point format (e.g., 8bits ) before multiplied by feature map in conolutional layers. My quantization function is :

wq = clip(round(w/stp), a,b)

where w,wq,stp, a and b are floating point weight, quantized weight, step size , min value and max value, respectively.

Then I want to fine tune my model with the quantized wieght.
So far, I have defined a new layer as quantization layer which accept the floating point weight as the input and returns the quantized value of weight. here is my questions:

  1. Is the the best way to impeliment fixed point network?
  2. Do I need to define a new backward method to be used during the traning(fine tuning) process ?
  3. How can I feed the weight of each convolution layer to the quantization layer ? or How can I use this layer in my model architecture?
class Linear_Quantization(nn.Module):
  def __init__(self):
    self.step_size = Parameter(torch.Tensor(1))
  def forward(self, x):
    x = torch.round(x/self.step_size) 
    x = torch.clamp(x, min = -2**(self.bit_width-1) , max = (2**(self.bit_width-1)-1))
    x = x*self.step_size
    return x

Hey Amin,
Were you able find answers for these, I am trying to solve similar issue it would be helpful if you can share the information

Hey Samira,

Recently, Pytorch has provided a wide range of quantization approaches. You can find more details here.


Thank you, Amin
I was looking to quantize a model to 4 bit, currently pytorch has it implemented for 8 bit. I was trying to create my own Qconfig and provide the FakeQuantization with minlevel and maxlevel. Do you think this is the right way of achieving it?

I am trying to do the same thing Sairam954. Has it worked for you that way? Technically, that makes sense.