How to export the weight after quantization

Hi, I have defined a neural network with a fully connected layer and applied Post Training Static Quantization for quantization. The version I use for pytorch is 2.0.0+cu118.
Here is the network architecture and the quantization process:

class HPC(nn.Module):
    def __init__(self, input_features, out_features): 
        super(HPC, self).__init__()
        self.linear1 = nn.Linear(input_features, 512) 
        self.linear2 = nn.Linear(512, 256)
        self.linear3 = nn.Linear(256, 64)
        self.linear4 = nn.Linear(64, out_features)
        self.sigmoid = nn.Sigmoid()
        self.quant = torch.ao.quantization.QuantStub()
        self.dequant = torch.ao.quantization.DeQuantStub()

    def forward(self, n_input): 
        out = self.quant(n_input)
        out = (self.linear1(out))
        out = (self.linear2(out))
        out = (self.linear3(out)) 
        out = self.sigmoid(self.linear4(out))
        out = self.quant(out)
        return out

import copy
#copy the trained model
myModel = copy.deepcopy(model)
myModel.eval()
myModel.qconfig = torch.ao.quantization.default_qconfig
print(myModel.qconfig)
torch.ao.quantization.prepare(myModel, inplace=True)
for i in range(200):
  X_train_arr_tensor = torch.Tensor(X_train_arr[i].flatten()).cuda()
  X_train_arr_tensors = torch.unsqueeze(X_train_arr_tensor, 0)
  myModel(X_train_arr_tensors)
torch.ao.quantization.convert(myModel, inplace=True)

When I completed the above operation, I tried to print out the quantized network parameters and the results are shown below.

for i in myModel.state_dict():
    print(i)

linear1.scale
linear1.zero_point
linear1._packed_params.dtype
linear1._packed_params._packed_params
linear2.scale
linear2.zero_point
linear2._packed_params.dtype
linear2._packed_params._packed_params
linear3.scale
linear3.zero_point
linear3._packed_params.dtype
linear3._packed_params._packed_params
linear4.scale
linear4.zero_point
linear4._packed_params.dtype
linear4._packed_params._packed_params
leaky.scale
leaky.zero_point
quant.scale
quant.zero_point

When I printed out myModel.linear1._packed_params, I found that the scale and zero_point values in _packed_params were not equal to myModel.linear1.scale and myModel.linear1.zero_point.

print(myModel.linear1._packed_params)

(tensor([[-0.0381, -0.0122, 0.0366, …, -0.0059, 0.0067, -0.0362],
[ 0.0196, 0.0037, -0.0277, …, 0.0074, -0.0362, 0.0126],
…,
[ 0.0126, 0.0403, -0.0104, …, 0.0359, -0.0107, 0.0263]],
device=‘cuda:0’, size=(512, 504), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.00036982630263082683,
zero_point=0), Parameter containing:
tensor([-2.7267e-02, -4.1525e-02, 3.3713e-02,…, -3.9943e-02], device=‘cuda:0’, requires_grad=True))

print("this is scale: ",myModel.linear1.scale)
print("this is zero_point:",myModel.linear1.zero_point)

this is scale: 0.030707480385899544
this is zero_point: 63

There are no quantized weights and biases for the fully connected layers here. May I know how can I obtain the weights and parameters for each layer?

1 Like

After quantization, the weights are stored in a packed format to enable efficient execution. The weights in the packed format are not directly accessible as regular tensors. Try this function.

def print_quantized_weights(model):
    for name, module in model.named_modules():
        if isinstance(module, nn.quantized.Linear):
            # Unpack the weight and bias
            weight, bias = module._packed_params.unpack()
            
            # Dequantize the weights to get the float values
            weight = weight.dequantize()
            
            print(f"{name}.weight:", weight)
            print(f"{name}.bias:", bias)
            print(f"{name}.scale:", module.scale)
            print(f"{name}.zero_point:", module.zero_point)
            print()

print_quantized_weights(myModel)

1 Like

Thanks bro, i will try it

Thank you very much, you have solved a problem that has been troubling me for a long time! :smiley:

1 Like