Trying to transfer quantized pytorch model to rknn

Hello everyone! I know that this is a little bit off-topic question, I’ve already left an issue in the rknn repo, but my hopes of getting an answer there are not very high, so I’ve decided to try my luck here

General description

If we take 2 quantized in pytorch models of same architecture:

  • the first one that has been prepared for qat and tuned
  • and the second one that was converted without any tuning or calibration, it is quatized with default scales and zero points

then the second model convert to rknn and outputs exactly the same results as it’s torch counterpart, and the first model output’s are different in rknn and pytorch

The question

I wonder if I can tackle the problem by changing something in my quantization process, like qconfig?

Models weights

Details

Environment

  • rknn-toolkit==1.7.1
  • torch==1.9.0+cu111
  • torchvision==0.10.0+cu111

Scenarios

  1. Quantize pytorch model via QAT then transfer to rknn
  2. Do not do any tuning or calibration and transfer to rknn

Scenario 1

How I obtain pytorch quantized model

Basically, I just train a model prepared for qat for a few epoch on cifar-10. The script with the implemented training loop can be found here

quantized_model = torchvision.models.quantization.resnet18(pretrained=False, num_classes=num_classes)  
quantized_model.fuse_model()

quantization_config = torch.quantization.get_default_qat_qconfig("qnnpack")  
quantized_model.qconfig = quantization_config  
  
torch.quantization.prepare_qat(quantized_model, inplace=True)  
  
# # Use training data for calibration.  
print("Training QAT Model...")  
quantized_model.train()
# Just a simple training loop
train_model(model=quantized_model,  
            train_loader=train_loader,  
            test_loader=test_loader,  
            device=cuda_device,  
            learning_rate=1e-3,  
            num_epochs=2)  

quantized_model.to(cpu_device)    
quantized_model = torch.quantization.convert(quantized_model, inplace=True)  
quantized_model.eval()  
    
  
# Save quantized model.  
input_ = torch.rand(size=(1, 3, 32, 32)).to('cpu')  
traced_model = torch.jit.trace(quantized_model, input_)  
torch.jit.save(traced_model, 'resnet18_quantized_cifar10.pt')

How I convert quantized pytorch model to rknn

For transfering quantized pytorch model to rknn I’ve just rewrote the example provided in the rknn-toolkit repo.

Image preprocessing for both models is the same. The only difference, for rknn model the data format is ‘nhwc’, for pytorch model the data format is ‘nchw’ (this difference is also present in the example mentioned above).

import torch  
import cv2  
import numpy as np  
from rknn.api import RKNN  
import torchvision  
  
mean = [0.485, 0.456, 0.406]  
std=[0.229, 0.224, 0.225]  
  
mean = [m*255 for m in mean]  
std = [s*255 for s in std]  
  
model_path = 'resnet18_quantized_cifar10.pt'   
  
def show_outputs(output):  
    output_sorted = sorted(output, reverse=True)  
    top5_str = '\n-----TOP 5-----\n'  
    for i in range(5):  
        value = output_sorted[i]  
        index = np.where(output == value)  
        for j in range(len(index)):  
            if (i + j) >= 5:  
                break  
            if value > 0:  
                topi = '{}: {}\n'.format(index[j], value)  
            else:  
                topi = '-1: 0.0\n'  
            top5_str += topi  
    print(top5_str)  
  
def show_perfs(perfs):  
    perfs = 'perfs: {}\n'.format(perfs)  
    print(perfs)  
  
def softmax(x):  
    return np.exp(x)/sum(np.exp(x))  
  
def main():  
  
    print('*'*20)  
    print("NOTE:")  
    print("    To run this demo, it's recommanded to use PyTorch1.9.0 and Torchvision0.10.0")  
    print('*'*20)  
  
    # prepare_model()  
    rknn = RKNN(verbose=False, verbose_file='./verbose_log.txt')  
  
    # pre-process config  
    print('--> Set config model')  
    rknn.config(quantize_input_node=False,  
                merge_dequant_layer_and_output_node=False,  
                # mean_values=[mean],  
                # std_values=[std],                
				# optimization_level=0,  
                # quantized_dtype='dynamic_fixed_point-i8',  
                # target_platform='rv1126',
			   )  
    print('done')  
  
    # Load Pytorch model  
    print('--> Loading model')  
    ret = rknn.load_pytorch(model=model_path, input_size_list=[[3, 32, 32]])  
    if ret != 0:  
        print('Load Pytorch model failed!')  
        exit(ret)  
    print('done')  
  
    # Build model  
    print('--> Building model')  
    ret = rknn.build(do_quantization=False,  
                     # dataset='dataset.txt',  
                     pre_compile=False)  
    if ret != 0:  
        print('Build model failed!')  
        exit(ret)  
    print('done')  
  
    # Export RKNN model  
    print('--> Export RKNN model')  
    ret = rknn.export_rknn('resnet18_quant.rknn')  
    if ret != 0:  
        print('Export resnet18_quant.rknn failed!')  
        exit(ret)  
    print('done')  
  
  
    # Init runtime environment  
    print('--> Init runtime environment')  
    ret = rknn.init_runtime()  
    # ret = rknn.init_runtime(target='rv1126', device_id='fa647528590c7546')  
    if ret != 0:  
        print('Init runtime environment failed')  
        exit(ret)  
    print('done')  
  
    # Set inputs  
    img = cv2.imread('./data/test.png')  
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)  
    img = cv2.resize(img, (32, 32))  
    for i in range(3):  
        img[:,:,i] = (img[:,:,i] - mean[i])/std[i]  
      
    # Inference  
    print('--> Running model')  
    outputs = rknn.inference(inputs=[img])  
    print('rk_result:')  
    print(np.array(outputs[0][0]))  
    show_outputs(softmax(np.array(outputs[0][0])))  
  
    torch.backends.quantized.engine = 'qnnpack'  
    pt_model = torch.jit.load(model_path).eval()  
    img = cv2.imread('./data/test.png')  
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)  
    img = cv2.resize(img, (32, 32))  
    for i in range(3):  
        img[:, :, i] = (img[:, :, i] - mean[i])/std[i]  
  
    input = torch.from_numpy(img).reshape(1, *img.shape)  
    input = input.permute(0, 3, 1, 2)  
  
    pt_result = pt_model(input)  
    pt_result = torch.dequantize(pt_result)  
    print('pt_result:')  
    print(pt_result[0].numpy())  
    show_outputs(softmax(pt_result[0].numpy()))  
  
  
if __name__ == '__main__':  
    main()

The conversion script yields the following result

********************
NOTE:
    To run this demo, it's recommended to use PyTorch1.9.0 and Torchvision0.10.0
********************
W Detect torchvision(0.10.0). If the python program got segment fault error, try to <import tensorflow> before <import torchvision>
--> Set config model
done
--> Loading model
./saved_models/resnet18_quantized_cifar10.pt ********************
W Pt model version is 1.6(same as you can check through <netron>), but the installed pytorch is 1.9.0+cu111. This may cause the model to fail to load.
W The pt model is quantized. User can try to set 'quantize_input_node=True' and 'merge_dequant_layer_and_output_node=True' in rknn.config. These setting may accelerate the inferencing on rknpu devices
done
--> Building model
W The target_platform is not set in config, using default target platform rk1808.
done
--> Export RKNN model
done
--> Init runtime environment
librknn_runtime version 1.7.1 (bd41dbc build: 2021-10-28 16:15:23 base: 1131)
done
--> Running model
rk_result:
[-0.3244629 -1.5761719  1.0664062  0.3244629  1.1123047  0.9736328
 -1.9003906  1.6689453 -1.1123047 -0.1854248]

-----TOP 5-----
[7]: 0.30284440517425537
[4]: 0.17356957495212555
[2]: 0.16578304767608643
[5]: 0.15109467506408691
[3]: 0.07894383370876312

[W TensorImpl.h:1156] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
pt_result:
[-0.2781416  -1.6688495   1.0198525   0.32449853  1.0662094   1.0198525
 -1.8079203   1.5761356  -1.2516371  -0.2781416 ]

-----TOP 5-----
[7]: 0.28748819231987
[4]: 0.1726481318473816
[2 5]: 0.1648273766040802
[2 5]: 0.1648273766040802
[3]: 0.08223201334476471

As you can see, the results differ quite a lot.

Scenario 2

How I obtain pytorch quantized model

import torch  
import torchvision  
  
myModel = torchvision.models.quantization.resnet18(pretrained=True)  
myModel.train()  
  
# Fuse Conv, bn and relu  
myModel.fuse_model()  
  
# Specify quantization configuration    
myModel.qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')  
# myModel.qconfig.qscheme = torch.per_tensor_symmetric  
print(myModel.qconfig)  
torch.quantization.prepare_qat(myModel, inplace=True)  
  
# No tuning, just straight convert with default parameters  
myModel.eval()  
torch.quantization.convert(myModel, inplace=True)  
  
input = torch.ones(1, 3, 224, 224)  
traced_model = torch.jit.trace(myModel, input)  
torch.jit.save(traced_model, 'resnet18_quant.pt')

How I convert quantized pytorch model to rknn

The conversion script is pretty much the same, the main difference is in the input size.

import torch  
import cv2  
import numpy as np  
from rknn.api import RKNN  
import torchvision  
  
mean = [0.485, 0.456, 0.406]  
std = [0.229, 0.224, 0.225]  
  
mean = [m * 255 for m in mean]  
std = [s * 255 for s in std]  
  
model_path = 'resnet18_quant.pt'  
  
  
def show_outputs(output):  
    output_sorted = sorted(output, reverse=True)  
    top5_str = '\n-----TOP 5-----\n'  
    for i in range(5):  
        value = output_sorted[i]  
        index = np.where(output == value)  
        for j in range(len(index)):  
            if (i + j) >= 5:  
                break  
            if value > 0:  
                topi = '{}: {}\n'.format(index[j], value)  
            else:  
                topi = '-1: 0.0\n'  
            top5_str += topi  
    print(top5_str)  
  
  
def show_perfs(perfs):  
    perfs = 'perfs: {}\n'.format(perfs)  
    print(perfs)  
  
  
def softmax(x):  
    return np.exp(x) / sum(np.exp(x))  
  
  
def main():  
    print('*' * 20)  
    print("NOTE:")  
    print("    To run this demo, it's recommanded to use PyTorch1.9.0 and Torchvision0.10.0")  
    print('*' * 20)  
  
    # prepare_model()  
    rknn = RKNN(verbose=False, verbose_file='./verbose_log.txt')  
  
    # pre-process config  
    print('--> Set config model')  
    rknn.config(quantize_input_node=False,  
                merge_dequant_layer_and_output_node=False,  
                # mean_values=[mean],  
                # std_values=[std],                # optimization_level=0,                # quantized_dtype='dynamic_fixed_point-i8',                # target_platform='rv1126',                )  
    print('done')  
  
    # Load Pytorch model  
    print('--> Loading model')  
    ret = rknn.load_pytorch(model=model_path, input_size_list=[[3, 224, 224]])  
    if ret != 0:  
        print('Load Pytorch model failed!')  
        exit(ret)  
    print('done')  
  
    # Build model  
    print('--> Building model')  
    ret = rknn.build(do_quantization=False,  
                     # dataset='dataset.txt',  
                     pre_compile=False)  
    if ret != 0:  
        print('Build model failed!')  
        exit(ret)  
    print('done')  
  
    # Export RKNN model  
    print('--> Export RKNN model')  
    ret = rknn.export_rknn('resnet18_quant.rknn')  
    if ret != 0:  
        print('Export resnet18_quant.rknn failed!')  
        exit(ret)  
    print('done')  
  
    # Init runtime environment  
    print('--> Init runtime environment')  
    ret = rknn.init_runtime()  
    # ret = rknn.init_runtime(target='rv1126', device_id='fa647528590c7546')  
    if ret != 0:  
        print('Init runtime environment failed')  
        exit(ret)  
    print('done')  
  
    # Set inputs  
    img = cv2.imread('./dog_224x224.jpg')  
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)  
    img = cv2.resize(img, (224, 224))  
    for i in range(3):  
        img[:, :, i] = (img[:, :, i] - mean[i]) / std[i]  
  
        # Inference  
    print('--> Running model')  
    outputs = rknn.inference(inputs=[img])  
    print('rk_result:')  
    show_outputs(softmax(np.array(outputs[0][0])))  
  
    torch.backends.quantized.engine = 'qnnpack'  
    pt_model = torch.jit.load(model_path).eval()  
    img = cv2.imread('./dog_224x224.jpg')  
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)  
    img = cv2.resize(img, (224, 224))  
    for i in range(3):  
        img[:, :, i] = (img[:, :, i] - mean[i]) / std[i]  
  
    input = torch.from_numpy(img).reshape(1, *img.shape)  
    input = input.permute(0, 3, 1, 2)  
  
    pt_result = pt_model(input)  
    pt_result = torch.dequantize(pt_result)  
    print('pt_result:')  
    show_outputs(softmax(pt_result[0].numpy()))  
  
    print('diff:')  
    print(np.sum(np.abs(pt_result[0].numpy() - np.array(outputs[0][0]))))  
  
  
if __name__ == '__main__':  
    main()

The conversion script yields the following result

********************
NOTE:
    To run this demo, it's recommanded to use PyTorch1.9.0 and Torchvision0.10.0
********************
W Detect torchvision(0.10.0). If the python program got segment fault error, try to <import tensorflow> before <import torchvision>
--> Set config model
done
--> Loading model
resnet18_quant.pt ********************
W Pt model version is 1.6(same as you can check through <netron>), but the installed pytorch is 1.9.0+cu111. This may cause the model to fail to load.
W The pt model is quantized. User can try to set 'quantize_input_node=True' and 'merge_dequant_layer_and_output_node=True' in rknn.config. These setting may accelerate the inferencing on rknpu devices
done
--> Building model
W The target_platform is not set in config, using default target platform rk1808.
done
--> Export RKNN model
done
--> Init runtime environment
librknn_runtime version 1.7.1 (bd41dbc build: 2021-10-28 16:15:23 base: 1131)
done
--> Running model
rk_result:

-----TOP 5-----
[530]: 0.19574490189552307
[409 418 723]: 0.07201052457094193
[409 418 723]: 0.07201052457094193
[409 418 723]: 0.07201052457094193
[446 626 754 818 892]: 0.026491189375519753

[W TensorImpl.h:1156] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
pt_result:

-----TOP 5-----
[530]: 0.19574490189552307
[409 418 723]: 0.07201052457094193
[409 418 723]: 0.07201052457094193
[409 418 723]: 0.07201052457094193
[446 626 754 818 892]: 0.026491189375519753

diff:
0.0

This time, the results between models match perfectly.

Hey,

I’m not familiar with rknn, does this pattern hold for toy models as well or is there a specific piece of the resnet that is causing the issue?