C++ output differs from python output

Pytorch1.0,Ubuntu16.04,Same model,Same transformation,but I got totally different result with python and cpp code.
Anyone can help me out?Thx a lot.

#========python output======#
tensor(9073.5586, device=‘cuda:0’, grad_fn=)
#========c++ output======#

I put the code in Baidu Cloud

My ?Chinese? knowledge to see the sources is somewhat limited :wink:

Hi @sq1988,

I can’t download your files. Can you please explain your problem in more detail here?

Just as a hint, you can format code by using triple quotation marks.

#include <something.h>

int main() {

    return 0;

Can you please additionally add the python code?

#include <torch/script.h>
#include <torch/torch.h>
#include <opencv2/opencv.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/core.hpp>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
int main(int argc, const char *argv[]) {
    torch::DeviceType device_type;
    device_type = torch::kCUDA;  //torch::kCUDA  and torch::kCPU
    torch::Device device(device_type, 0);
    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("model.pt");
    assert(module != nullptr);
    std::cout << "load model ok\n";
    auto image = cv::imread("284193,c89b000bf74ce6e.jpg");
    cv::Mat image_transfomed;
    cv::resize(image, image_transfomed, cv::Size(1024, 720),cv::INTER_AREA);
    cv::cvtColor(image_transfomed, image_transfomed, cv::COLOR_BGR2RGB);
    torch::Tensor tensor_image = torch::from_blob(image_transfomed.data, {image_transfomed.rows, image_transfomed.cols,3},torch::kByte).to(device);//hwc
    tensor_image = tensor_image.permute({ 2,0,1 });//chw
    tensor_image = tensor_image.toType(torch::kFloat);
    tensor_image = tensor_image.div(255.0);
    tensor_image[0] = tensor_image[0].sub_(0.485).div_(0.229);
    tensor_image[1] = tensor_image[1].sub_(0.456).div_(0.224);
    tensor_image[2] = tensor_image[2].sub_(0.406).div_(0.225);
    //add batch dimension
    tensor_image = tensor_image.unsqueeze(0);
    std::vector<torch::jit::IValue> inputs;
    auto t = (double) cv::getTickCount();
    at::Tensor output = module->forward(inputs).toTensor().cpu();
    at::IntList sizes = output.sizes();//shape
    //get result
    auto count = output[0][0].sum();
    t = (double) cv::getTickCount() - t;
    printf("count:%f,execution time = %gs\n",count,t / cv::getTickFrequency());
    return 0;
import PIL.Image as Image
import torch
from torchvision import transforms
torch.backends.cudnn.benchmark = True
import torch
                       transforms.Resize((720, 1024)),
                       transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
org = Image.open("284193,c89b000bf74ce6e.jpg")
img = transform(org).cuda()
traced_script_module = torch.jit.load('model.pt')
output = traced_script_module(img.unsqueeze(0))

Any suggestion will be appreciated.

what are the sizes of your output? Are they the same for python and cpp? So maybe you are not summing over all elements.

1,1,720,1024 for c++ and python.
Can you give me your email address so I can send you the model?

you could make your code publicly available at github or something similar, and I will have a look at it

I upload the code and model in Dropbox.

I upload the code and model in Dropbox.

ye sry, I will try to check whats the difference, but there is no cuda available to me atm. Do you have the same problem, when you run the code on a cpu? May you save the script module with device type cpu?

Did you use batch norm in your code, if so ,you should call “module->eval()” before calling “module->forward()”