C++ output differs from python output

(sunqin) #1

Pytorch1.0,Ubuntu16.04,Same model,Same transformation,but I got totally different result with python and cpp code.
Anyone can help me out?Thx a lot.

#========python output======#
tensor(9073.5586, device=‘cuda:0’, grad_fn=)
#========c++ output======#

I put the code in Baidu Cloud


My ?Chinese? knowledge to see the sources is somewhat limited :wink:

(Martin Huber) #3

Hi @sq1988,

I can’t download your files. Can you please explain your problem in more detail here?

(Martin Huber) #5

Just as a hint, you can format code by using triple quotation marks.

#include <something.h>

int main() {

    return 0;

Can you please additionally add the python code?

(sunqin) #6
#include <torch/script.h>
#include <torch/torch.h>
#include <opencv2/opencv.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/core.hpp>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
int main(int argc, const char *argv[]) {
    torch::DeviceType device_type;
    device_type = torch::kCUDA;  //torch::kCUDA  and torch::kCPU
    torch::Device device(device_type, 0);
    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("model.pt");
    assert(module != nullptr);
    std::cout << "load model ok\n";
    auto image = cv::imread("284193,c89b000bf74ce6e.jpg");
    cv::Mat image_transfomed;
    cv::resize(image, image_transfomed, cv::Size(1024, 720),cv::INTER_AREA);
    cv::cvtColor(image_transfomed, image_transfomed, cv::COLOR_BGR2RGB);
    torch::Tensor tensor_image = torch::from_blob(image_transfomed.data, {image_transfomed.rows, image_transfomed.cols,3},torch::kByte).to(device);//hwc
    tensor_image = tensor_image.permute({ 2,0,1 });//chw
    tensor_image = tensor_image.toType(torch::kFloat);
    tensor_image = tensor_image.div(255.0);
    tensor_image[0] = tensor_image[0].sub_(0.485).div_(0.229);
    tensor_image[1] = tensor_image[1].sub_(0.456).div_(0.224);
    tensor_image[2] = tensor_image[2].sub_(0.406).div_(0.225);
    //add batch dimension
    tensor_image = tensor_image.unsqueeze(0);
    std::vector<torch::jit::IValue> inputs;
    auto t = (double) cv::getTickCount();
    at::Tensor output = module->forward(inputs).toTensor().cpu();
    at::IntList sizes = output.sizes();//shape
    //get result
    auto count = output[0][0].sum();
    t = (double) cv::getTickCount() - t;
    printf("count:%f,execution time = %gs\n",count,t / cv::getTickFrequency());
    return 0;
import PIL.Image as Image
import torch
from torchvision import transforms
torch.backends.cudnn.benchmark = True
import torch
                       transforms.Resize((720, 1024)),
                       transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
org = Image.open("284193,c89b000bf74ce6e.jpg")
img = transform(org).cuda()
traced_script_module = torch.jit.load('model.pt')
output = traced_script_module(img.unsqueeze(0))

(sunqin) #7

Any suggestion will be appreciated.

(Martin Huber) #8

what are the sizes of your output? Are they the same for python and cpp? So maybe you are not summing over all elements.

(sunqin) #10

1,1,720,1024 for c++ and python.
Can you give me your email address so I can send you the model?

(Martin Huber) #11

you could make your code publicly available at github or something similar, and I will have a look at it

(sunqin) #12

I upload the code and model in Dropbox.

(sunqin) #13

I upload the code and model in Dropbox.

(Martin Huber) #14

ye sry, I will try to check whats the difference, but there is no cuda available to me atm. Do you have the same problem, when you run the code on a cpu? May you save the script module with device type cpu?