C++ Inference using OpenCV Data

Hi,

I’m trying to use a model which I trained in Python in C++ for inference. While being in Python, the model produces reasonable outputs. However, when using it for inference in C++, the output does not make sense. My feeling is that the input/output formats are somehow wrong (type, size, etc.). In the following, I will detail the steps for:

  • Exporting the model via tracing (in Python)
  • Loading the model (in C++)
  • Preparing the input data for the model (from OpenCV data structures in C++)
  • Forward pass
  • Converting the output data of the network back to OpenCV

Any idea or help would be greatly appreciated. I already checked a lot of resources both here and on Stack Overflow, and other sources. According to what I found, I cannot identify the error.

Thanks in advance,
Thorsten

Exporting the model

model = model.to('cpu')
model.train(False)
example = torch.rand(1, 3, height, width)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save(save_filename)

Moving model to CPU as this is were inference will be done (does it matter?).
Disabling training mode as I am using “training-only” features like batch normalization.

Loading model

std::ifstream is(network_file_path, std::ifstream::binary);
std::shared_ptr<torch::jit::script::Module> network = torch::jit::load(is);

Input data preparation

There are three data channels which are stacked like a three layer image. Except that the channels do not represent RGB or the channels of another color format but our own data.

data0, data1 and data2 contain 8_bit data (value range 0 to 255). For the training, this was converted to float and scaled to the range 0 to 1.0. All channels have the same size.

So, converting to float and scaling:

cv::Mat channel0(data0.size(), CV_32FC1);
cv::Mat channel1(data1.size(), CV_32FC1);
cv::Mat channel2(data2.size(), CV_32FC1);

float scale_factor = 1 / 255.0;

data0.convertTo(channel0, CV_32F, scale_factor);
data1.convertTo(channel1, CV_32F, scale_factor);
data2.convertTo(channel2, CV_32F, scale_factor);

Now, the three channels are combined to one cv::Mat with three channels:

std::vector<cv::Mat> network_input_channels;
network_input_channels.push_back(channel0);
network_input_channels.push_back(channel1);
network_input_channels.push_back(channel2);

cv::Mat network_input(channel0.size(), CV_32FC3);
cv::merge(network_input_channels, network_input);    

The input is converted to a tensor:

std::vector<int64_t> sizes = {1, 3, network_input.rows, network_input.cols};
at::TensorOptions options(at::kFloat);
at::Tensor input_tensor = torch::from_blob(network_input.data, at::IntList(sizes), options);

std::vector<torch::jit::IValue> inputs;
inputs.push_back(input_tensor);

When I examine the values of the input tensor via std::cout and compare the values with the expected values for my data from the channels above, the values are different. Therefore, I suspect that the error has already happened until this point. However, I cannot rule out that the data is just interpretetet differently by the tensor and everything is still fine. Therefore, for the sake of completeness, the next steps in the pipeline.

Forward pass

auto output_tensor = network->forward(inputs).toTensor();

Converting data back to OpenCV data structures

cv::Mat output_tmp(cv::Size(width, height) , CV_32FC1, output_tensor.data<float>());
cv::Mat network_output(output_tmp.size(), CV_8UC1);
float scale_factor_up = 255.0;
output_tmp.convertTo(network_output, CV_8U, scale_factor_up);

As expected, the values in the output do not make sense. Most of the output is saturated (maxed out or zero). I blame the wrong input for that and not the model because in Python I get fine results.

If anybody could give me any hints I would be very thankful for your help.

Best,
Thorsten

Follow-up remarks:
Pytorch version: 1.0
Python installation of Pytorch: Using Anaconda
CUDA version: 9.0 (only used for training)
Libtorch was compiled from source. It is not possible to use the downloadable version because our project also relies on boost. The boost/Pytorch issue is discussed e.g. here: Libtorch does not link together with boost

OpenCV is typically HWC (channel last) and BRG at that. Print out a small (eg 3x3x3) subtensor of the C++ and Python inputs, they must match.
If you can’t match those check preprocessing - eg print mean and std. Those should match, too. You might be able to absorb preprocessing into the model (by reimplementing it).

Best regards

Thomas

Thanks for your reply. Yes, that seems to be the issue. Inspecting the values confirms your thought that the dimensions from openCV do not match.

I see two possible solutions:

  • Create three 2D tensors and merging them (instead of merging in OpenCV and creating a tensor afterwards). How can I merge the tensors?
  • Reshaping/Sorting the existing tensor which was created using torch::from_blob. How can I reshape the tensor?

I guess that I can find the answer to both questions here: https://pytorch.org/cppdocs/api/namespace_at.html#namespace-at

However, the documentation is rather limited. I could therefore need a hand pointing me to the correct function. If possible, maybe a little example :slight_smile:

I’d first check the Python documentation. You probably want permute and reshape, pass the dimensions / shapes as t.permute({1, 2, 0}).

Best regards

Thomas

Dear Thomas,

Thanks for the input.

After studying what torch::from_blob really does (interpreting some allocated memory in a designated way - in my case as if if contains floating point numbers), I decided to simply provide a memory portion with the correct data stored in it. I chose the classic “C-Style” way using a pointer to an array. Although I would usually prefer a more C+±ish solution, it works for now.

Solution

In case anybody finds this thread and has the same problem, here is what I did to solve my problem:

float manual_input[channel_number][row_number][column_number] // You have to allocate the memory dynamically, using this line only for the sake of shortness. 
for (int y = 0; y < 64; y++) {
    for (int x = 0; x < 96; x++) {
        manual_input[0][y][x] = channel0.at<float>(cv::Point(x,y));
        std::cout << manual_input[0][y][x]; // Verifying my input data
    }
    std::cout << std::endl; // Verifying my input data
}
// Same loop for the other two channels

std::vector<int64_t> sizes = {1, channel_number, row_number, column_number};
at::TensorOptions options(at::kFloat);
at::Tensor input_tensor = torch::from_blob(manual_input, at::IntList(sizes), options);

channel0 as in the original post, channel_number, row_number and column_number according to the input size.

Inference with input_tensor as in the original post.

Thanks,
Thorsten

Glad you solved it!

If anyone is searching for a more compact and efficient solution, you might try something like

std::vector<int64_t> sizes = {1, network_input.rows, network_input.cols, 3};
torch::TensorOptions options(at::kFloat);
torch::Tensor input_tensor = torch::from_blob(network_input.data, at::IntList(sizes), options);

std::vector<torch::jit::IValue> inputs;
inputs.push_back(input_tensor.permute({0, 3, 1, 2}));

Note that input_tensor and thus also the permutation don’t own the memory and you need to keep network_input around or clone after from_blob.

Best regards

Thomas

2 Likes

Nice! I confirm that the solution of Thomas works as well as mine. Thomas solution is much more elegant, so I will choose it. That was the function I was looking for.

Thanks,
Thorsten

A tangential follow-up: we now have new ABI binaries for libtorch that works with boost. They can be found on
http://pytorch.org, or at https://github.com/pytorch/pytorch/issues/17492#issuecomment-524692441.