Hi, I have defined a neural network with a fully connected layer and applied Post Training Static Quantization for quantization. The version I use for pytorch is 2.0.0+cu118.
Here is the network architecture and the quantization process:
class HPC(nn.Module):
def __init__(self, input_features, out_features):
super(HPC, self).__init__()
self.linear1 = nn.Linear(input_features, 512)
self.linear2 = nn.Linear(512, 256)
self.linear3 = nn.Linear(256, 64)
self.linear4 = nn.Linear(64, out_features)
self.sigmoid = nn.Sigmoid()
self.quant = torch.ao.quantization.QuantStub()
self.dequant = torch.ao.quantization.DeQuantStub()
def forward(self, n_input):
out = self.quant(n_input)
out = (self.linear1(out))
out = (self.linear2(out))
out = (self.linear3(out))
out = self.sigmoid(self.linear4(out))
out = self.quant(out)
return out
import copy
#copy the trained model
myModel = copy.deepcopy(model)
myModel.eval()
myModel.qconfig = torch.ao.quantization.default_qconfig
print(myModel.qconfig)
torch.ao.quantization.prepare(myModel, inplace=True)
for i in range(200):
X_train_arr_tensor = torch.Tensor(X_train_arr[i].flatten()).cuda()
X_train_arr_tensors = torch.unsqueeze(X_train_arr_tensor, 0)
myModel(X_train_arr_tensors)
torch.ao.quantization.convert(myModel, inplace=True)
When I completed the above operation, I tried to print out the quantized network parameters and the results are shown below.
for i in myModel.state_dict():
print(i)
linear1.scale
linear1.zero_point
linear1._packed_params.dtype
linear1._packed_params._packed_params
linear2.scale
linear2.zero_point
linear2._packed_params.dtype
linear2._packed_params._packed_params
linear3.scale
linear3.zero_point
linear3._packed_params.dtype
linear3._packed_params._packed_params
linear4.scale
linear4.zero_point
linear4._packed_params.dtype
linear4._packed_params._packed_params
leaky.scale
leaky.zero_point
quant.scale
quant.zero_point
When I printed out myModel.linear1._packed_params, I found that the scale and zero_point values in _packed_params were not equal to myModel.linear1.scale and myModel.linear1.zero_point.
print(myModel.linear1._packed_params)
(tensor([[-0.0381, -0.0122, 0.0366, …, -0.0059, 0.0067, -0.0362],
[ 0.0196, 0.0037, -0.0277, …, 0.0074, -0.0362, 0.0126],
…,
[ 0.0126, 0.0403, -0.0104, …, 0.0359, -0.0107, 0.0263]],
device=‘cuda:0’, size=(512, 504), dtype=torch.qint8,
quantization_scheme=torch.per_tensor_affine, scale=0.00036982630263082683,
zero_point=0), Parameter containing:
tensor([-2.7267e-02, -4.1525e-02, 3.3713e-02,…, -3.9943e-02], device=‘cuda:0’, requires_grad=True))
print("this is scale: ",myModel.linear1.scale)
print("this is zero_point:",myModel.linear1.zero_point)
this is scale: 0.030707480385899544
this is zero_point: 63
There are no quantized weights and biases for the fully connected layers here. May I know how can I obtain the weights and parameters for each layer?