-
I am using alexnet model ,where 7layers are binarized(input and weight),1st layer is not binarized(input,weight are floating point 32 bit).I want only 1st layer input,weight to be converted to 8 bit before sending into the convolution function without harming the other.
-
i am using pretrained weight here
Just to make it clear – when you say “convert to 8bit” are you using quantization or are you just casting the types down? Also, we don’t support quantization lower than 8 bits, so binarization of the layers might not be supported without custom hacks.
Lastly, if you already have the weights, and you just need an 8-bit model, you can follow these steps:
- Make sure your model is quantizable – all layers in your network must be stateful and unique, that is, no “implied” layers in the forward and no inplace computation
- Prepare the model using
prepare
function - Calibrate the prepared model by running through your data AT LEAST once
- Convert your model to the quantized version.
You can follow the PTQ tutorial here: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html
On the first point:
This model cannot be quantized:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.relu = nn.ReLU(inplace=True)
def forward(self, a, b):
ra = self.relu(a)
rb = self.relu(b)
return ra + rb
To make the model quantizable, you need to make sure there are no inplace operations, and every operation can save the state:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.relu_a = nn.ReLU(inplace=False)
self.relu_b = nn.ReLU(inplace=False)
self.F = nn.quantized.FloatFunctional()
def forward(self, a, b):
ra = self.relu_a(a)
rb = self.relu_b(b)
return self.F.add(ra, rb)
If you want to have the model take FP input and return the FP output you will need to insert the QuantStub
/DequantStub
at the appropriate locations:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.quant_stub_a = torch.quantization.QuantStub()
self.quant_stub_b = torch.quantization.QuantStub()
self.relu_a = nn.ReLU(inplace=False)
self.relu_b = nn.ReLU(inplace=False)
self.F = nn.quantized.FloatFunctional()
self.dequant_stub = torch.quantization.DeQuantStub()
def forward(self, a, b):
qa = self.quant_stub_a(a)
qb = self.quant_stub_b(b)
ra = self.relu_a(qa)
rb = self.relu_b(qb)
return self.dequant_stub(self.F.add(ra, rb))
Similarly, if you would like to only quantize a single layer, you would need to place the quant/dequant only where you want to quantize. Please, note that you would need to specify the quantization parameters appropriately:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.quant_stub_a = torch.quantization.QuantStub()
self.relu_a = nn.ReLU(inplace=False)
self.relu_b = nn.ReLU(inplace=False)
def forward(self, a, b):
qa = self.quant_stub_a(a)
ra = self.relu_a(qa)
a = self.dequant_stub(ra)
rb = self.relu(b)
return ra + rb
The model above will be partially quantizable, and you would need to give the qconfig to the quant_stub and the relu only.