The result of libtorch is different to pytorch

I want to deploy the model in C++, but i found the result i got in libtorch is different to pytorch. I can got 90% accuracy in pytorch but 20% in C++. How could lead to the difference?
I just give the model the same data input size. Here is my code for input:

torch::Tensor tensor_image = torch::from_blob(, { 300, 800, 3 }, torch::kByte);
tensor_image = tensor_image.permute({ 2,0,1});
tensor_image = tensor_image.toType(torch::kFloat);
tensor_image =;

torch::Tensor tensor_asc = torch::from_blob(, { 300, 800}, torch::kByte);
tensor_asc = tensor_asc.toType(torch::kFloat);
tensor_asc =;

cout << “The data has been transformed to tensor\n”;

tensor_asc = torch::unsqueeze(tensor_asc, 0);
torch::Tensor inputs = torch::cat({ tensor_image, tensor_asc }, 0);
inputs = torch::stack({ inputs, inputs }, 0);
inputs = torch::unsqueeze(inputs, 0);

I need to concat two img. and i did the same in python for pytorch. The inference accuracy is 90%.

How did you load the image in C++?
If you’ve used OpenCV, note that the default color channel format is BGR, while torchvision uses PIL by default, which is using RGB.
If that might be the case, then you would have to convert the color channels in your C++ implementation.

@ptrblck, in notice the img channel has been changed in the opencv and i have add the code cvtColor(img, img, COLOR_RGB2BGR); to make the channel right. And the results is also bad in the C++. did the model of “.pt” transformed from “.pth” would inference the accuracy?

No, the file suffix shouldn’t change anything.
Did you make sure to apply exactly the same preprocessing pipeline in C++ as was used in Python (normalization etc.)?

@ptrblck Thanks, I would check my code carefully. And what you mean is that the results of libtorch should almost be same as pytorch?

If you are using the same preprocessing and load the same scripted or traced model, then the results should be the same up the the limited floating point precision.

I mistake the pointer in C++, the real accuray is 79.8%, which is same as 82.05% when i uese the code:

output = scripted_model(torch.from_numpy(input_data).cuda())

to do the infrence. However, the those accuracy is lower than 96.7% when is ues the pytorch code to do the inference.

What is the difference between the posted code snippet and the other code you are using to achieve the 96% accuracy?

The whole code is as follows:

import cv2
import time
import torch
import numpy as np
from ptsemseg.models import get_model
from ptsemseg.utils import convert_state_dict

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = get_model({'arch': 'hardnet'}, 3).to(device)
the_model = torch.load("./weights/hardnet_2D_best_model.pkl")
state = convert_state_dict(the_model['model_state'])
model = model.eval()
input_shape = [1, 2, 4, 300, 800]
input_data = torch.randn(input_shape, device='cuda')
scripted_model = torch.jit.trace(model, input_data)
start_time = time.perf_counter()
///The input data was dealt as the same way.
output = scripted_model(torch.from_numpy(input_data).cuda())
elapsed_time = time.perf_counter() - start_time"")

I calculate the output, and the accruracy is only 82.05%.

And the 96.7% is obtained from the pytorch inference with another code.

In that case I would recommend to compare the posted code to the other code, which yields the higher accuracy, and try to narrow down potential discrepancies between these codes.

Thanks for you advice! i would try as you said.

@ptrblck, i check the code which is the same as what in pytorch about the data pre-procession. And i found there are two ways to transform pytorch model to torch script, which is trace and annotation. And i use trace to do that. The official said that the trace would limit the control flow such as “if else”, which several exist in my code. i wonder is real the control flow was limited that the trace-torch script can’t get the same accuracy as pytorch did?

That might be the case, as the executed forward pass with be traced and other paths through the model won’t be used.
Therefore you should use a scripted model, which will collect the conditions.

Did the annotation model would deal with the proble and allow the controal flow works well?

Yes, scripting will record the control flow (torch.jit.script).

@ptrblck Thankyou for your patient reply. I have ideas to realize now!

Make sure you divide by 255 to normalize your input before you run any processing.
that is make sure you have done sth like this:

tensor_image = tensor_image.toType(torch::kFloat).div(255);

This is most likeley the case ! tell us if this was it!