Understand the usage of quantized weights from quantized model

Giang_Dang · July 6, 2020, 4:03am

Sorry if the question has been answered somewhere, I couldn’t find similar question across the forum so I would want to put my question here, and hope for your answer.

So we have a simple trained model, and applied the static quantization to get quantized model using ‘fbgemm’ as qconfig:
myModel.qconfig = torch.quantization.get_default_qconfig('fbgemm')

After this, we have quantized model with weights (int_repr()) exported.
I expect if I create a similar architecture, and import the int represented weight in, I can generate same result per layer as quantized model, but turn out the results are different.

Below is detailed flows:
#Notes: x_batch and x_quant were exported previously with quant model eval to pickle file and reload here for comparison

    #Flow 1
    #using x as input, calculate results through loaded quantized model 
    #forward: x--> x_quant = self.quant(x) --> f = self.featExt(x_quant)
    # featExt definition: self.featExt = nn.Sequential(nn.Conv2d(1, 8, 
    #                   kernel_size=5, stride=5, bias=False), nn.ReLU())
    x_quant_new, f, x_conv, y_hat = quant_net.forward(x_batch[0])
    print('using saved quantized model: ')
    print('x_quant to compare(int): ', x_quant_new.int_repr())
    print('filter to compare(int): ', quant_net.featExt[0].weight().int_repr())
    print('output to compare(int): ', f.int_repr())

    #Flow 2
    #using x_quant as input, calculate conv 2d using pytorch function
    conv2d = nn.Conv2d(1, 8, kernel_size=5, stride=5, bias=False)
    conv2d.weight.data = my_debug_net.featConv.weight.data
    with torch.no_grad():
        conv2d.eval()
        res1 = conv2d(x_quant[0].type(torch.CharTensor)) 
    print('*********using F.conv2d***********')
    print('x_quant: ', x_quant[0])
    print('filter: ', conv2d.weight.data)
    print('F.conv2d Output ', res1)
    print('F.relu Output ', F.relu(res1))

Vasiliy_Kuznetsov · July 6, 2020, 9:41pm

This should be possible, if the weights are copied correctly. Would you have a reproducible toy example of this behavior?

Giang_Dang · July 7, 2020, 4:18am

Thanks for confirming the thinking. I can’t upload the quantized model and architecture we are currently working here but for the purpose of demonstrating, I will create a toy example to share for the investigation.
For now I can share the log from the 2 flows I put in my question, that is to prove the weights are the same. Perhaps with this log you will find something that I had missed.
I added the log here to avoid messing-up the conversation: https://drive.google.com/drive/folders/1O7A96jJIWbqS_5uYL1tmp__N6LJHMh9k?usp=sharing

Vasiliy_Kuznetsov · July 15, 2020, 4:54pm

Hi @Giang_Dang,

Unfortunately it’s hard to spot what could be missing in your code without seeing it. Here is a toy example representing the expected behavior:

import torch
import torch.nn as nn
# toy model

class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.quant = torch.quantization.QuantStub()
        self.fc = nn.Linear(2, 2)
        self.dequant = torch.quantization.DeQuantStub()
    
    def forward(self, x):
        x = self.quant(x)
        x = self.fc(x)
        x = self.dequant(x)
        return x

m1 = M()
m2 = M()

def static_quant(m):
    m.qconfig = torch.quantization.get_default_qconfig('fbgemm')
    torch.quantization.prepare(m, inplace=True)
    # toy calibration
    data = torch.rand(4, 2)
    m(data)
    torch.quantization.convert(m, inplace=True)
    
static_quant(m1)
static_quant(m2)
# m1 and m2 now have different weights, because of different
# initialization, and different calibration data

# verify that same inputs do not lead to same outputs
data = torch.rand(16, 2)
print('outputs match', torch.allclose(m1(data), m2(data)))

# set m2's weights to be equal to m1's weights
m2.quant.load_state_dict(m1.quant.state_dict())
m2.fc.load_state_dict(m1.fc.state_dict())

# verify that same inputs lead to same outputs
data = torch.rand(16, 2)
print('outputs match', torch.allclose(m1(data), m2(data)))

One thing you could try is to use the state dict to transfer weights between modules of the same type, instead of manually copying attributes over. However, if you manually transfer all the attributes correctly, it should work as well.

Giang_Dang · July 16, 2020, 7:02am

Hi @Vasiliy_Kuznetsov: thank you for taking the time to create the toy example.
The approach to save state_dict and reload the state_dict with the same architecture as you described would work as expected, and I don’t have issue with that.
To clarify, my purpose is to have: trained pytorch model (M) -> quantized trained pytorch model(M1) -> port to run on ARM cortex-M4 with CMSIS-NN (M3).
In order to do so, I am doing the intermediate steps:
quantized trained pytorch model(M2) -> export weights param in integers -> load to a brand new Pytorch architecture without quantized info(M2_int) -> this model will be close to what is developed in embedded device (M3).
I will update your example to show the above steps. What I am not clear is some normalization steps done in pytorch internal functions, that would be different between quantized and non-quantized model.

Vasiliy_Kuznetsov · July 16, 2020, 4:10pm

The state dicts don’t have to be used on the whole model, you can do it module by module, something like model2.conv3.load_state_dict(model1.conv3.state_dict()). But in any case, loading a state dict is the same thing as transferring all the attributes manually, it’s just easier.

load to a brand new Pytorch architecture without quantized info(M2_int)

If you are still seeing different results after transferring the weights, there could be other differences. Some things to debug would be:

are the other parameters you need to transfer (conv bias, etc)
is the input data coming in exactly the same (you are modeling quant/dequant correctly, etc)

Shisho_Sama · July 18, 2020, 8:07am

Unless the two architectures are the same, you can not expect to get the same same result as your network output. You are guaranteed to get the same result for the very same layers, with the same input, but anything other than that will cause the result to change.

Giang_Dang · July 21, 2020, 10:33am

Hi Both,
I am thankful for your time to look into the issue.
I totally agree with you both on the logic. I modified the program from @Vasiliy_Kuznetsov to demonstrate what I am trying to achieve. Would this be explained, I am thankful for that, since this is an essential step to convert pytorch model to C model:

import torch
import torch.nn as nn
# toy model

class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.quant = torch.quantization.QuantStub()
        self.conv = nn.Conv2d(1,1,kernel_size=2,stride=2,padding=0,bias=False)
        self.dequant = torch.quantization.DeQuantStub()
    
    def forward(self, x):
        x_quant = self.quant(x)
        x_conv = self.conv(x_quant)
        y = self.dequant(x_conv)
        return x_quant, x_conv, y

class M_int(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(1,1,kernel_size=2,stride=2,padding=0,bias=False)
    
    def forward(self, x):
        # get x_quant as input
        x_conv = self.conv(x)
        return x_conv


m1 = M()
m2 = M()

def static_quant(m):
    m.qconfig = torch.quantization.get_default_qconfig('fbgemm')
    torch.quantization.prepare(m, inplace=True)
    # toy calibration
    data = torch.rand(4, 1, 2, 2)
    m(data)
    torch.quantization.convert(m, inplace=True)
    
static_quant(m1)
static_quant(m2)
# m1 and m2 now have different weights, because of different
# initialization, and different calibration data

# verify that same inputs do not lead to same outputs
data = torch.rand(4, 1, 2, 2)
print('outputs match', torch.allclose(m1(data)[2], m2(data)[2]))

# set m2's weights to be equal to m1's weights
m2.quant.load_state_dict(m1.quant.state_dict())
m2.conv.load_state_dict(m1.conv.state_dict())

# verify that same inputs lead to same outputs
data = torch.rand(4, 1, 2, 2)
print('outputs match', torch.allclose(m1(data)[2], m2(data)[2]))

m3 = M_int()
with torch.no_grad():
    m3.conv.weight.data = m1.conv.state_dict()['weight'].int_repr().type(torch.ByteTensor)
    m3.eval()
    data = torch.rand(4, 1, 2, 2)
    x_quant, x_conv, y = m1(data)
    x_conv3 = m3(x_quant.int_repr().type(torch.ByteTensor))
print('weight match', torch.allclose(m1.conv.state_dict()['weight'].int_repr().type(torch.ByteTensor), m3.conv.weight.data))
print('outputs match', torch.allclose(x_conv.int_repr(), x_conv3))

M_int model is the fresh model with integer weight loaded-in.
I expect to have the result after conv layer to be the same for m1 and m3.
I changed from linear to conv just because I am debugging for convolution2D currently.

Vasiliy_Kuznetsov · July 21, 2020, 4:27pm

Hi @Giang_Dang,

I’m not sure if it makes sense conceptually to try to put weights from a quantized layer directly into a floating point layer. Consider the translation between the quantized and floating point domain:

x_quant = round(x_fp / scale + zero_point)
x_fp = (x_quant - zero_point) * scale

For the weights of the quantized conv, even though they are stored in the quantized domain, they represent the floating point domain. To use them in non-quantized layers you’d need to convert back to the floating point domain.

Giang_Dang · July 22, 2020, 3:43am

Hi @Vasiliy_Kuznetsov,

For quant() layer yes I managed to figure out the formula and it is fine to apply.
For the weights of convolution layer, it goes the same formula to calculate int_repr() values from float with scale and zero_point.
The purpose of quantization is to have parameters in integer and hence reduce computation cost during convolution. If we couldn’t produce the same result with plain network with these weights, it seems the task to port successfully to C model is not feasible, or at least, not well-supported by Pytorch currently.

Cheers,
Giang

Vasiliy_Kuznetsov · July 22, 2020, 3:38pm

with torch.no_grad():
    m3.conv.weight.data = m1.conv.state_dict()['weight'].int_repr().type(torch.ByteTensor)

This line doesn’t seem to be applying the dequantization. If you want m3.conv to match m1.conv when m3 is floating point and m1 is quantized, you would need to convert the weights back to floating point. Int_repr() returns the integer weights but it does not dequantize them.

One other thing you could consider is to run quantization on m3 directly.

111179 · September 6, 2020, 1:39pm

hi, have you solve your problem? I have the same question

eddie · December 21, 2022, 7:44am

Hi @Giang_Dang, apologies for digging up this thread but we’re facing exactly the same issue here. Were you able to solve it?