Hi,
I’m trying to use a model which I trained in Python in C++ for inference. While being in Python, the model produces reasonable outputs. However, when using it for inference in C++, the output does not make sense. My feeling is that the input/output formats are somehow wrong (type, size, etc.). In the following, I will detail the steps for:
- Exporting the model via tracing (in Python)
- Loading the model (in C++)
- Preparing the input data for the model (from OpenCV data structures in C++)
- Forward pass
- Converting the output data of the network back to OpenCV
Any idea or help would be greatly appreciated. I already checked a lot of resources both here and on Stack Overflow, and other sources. According to what I found, I cannot identify the error.
Thanks in advance,
Thorsten
Exporting the model
model = model.to('cpu')
model.train(False)
example = torch.rand(1, 3, height, width)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save(save_filename)
Moving model to CPU as this is were inference will be done (does it matter?).
Disabling training mode as I am using “training-only” features like batch normalization.
Loading model
std::ifstream is(network_file_path, std::ifstream::binary);
std::shared_ptr<torch::jit::script::Module> network = torch::jit::load(is);
Input data preparation
There are three data channels which are stacked like a three layer image. Except that the channels do not represent RGB or the channels of another color format but our own data.
data0, data1 and data2 contain 8_bit data (value range 0 to 255). For the training, this was converted to float and scaled to the range 0 to 1.0. All channels have the same size.
So, converting to float and scaling:
cv::Mat channel0(data0.size(), CV_32FC1);
cv::Mat channel1(data1.size(), CV_32FC1);
cv::Mat channel2(data2.size(), CV_32FC1);
float scale_factor = 1 / 255.0;
data0.convertTo(channel0, CV_32F, scale_factor);
data1.convertTo(channel1, CV_32F, scale_factor);
data2.convertTo(channel2, CV_32F, scale_factor);
Now, the three channels are combined to one cv::Mat with three channels:
std::vector<cv::Mat> network_input_channels;
network_input_channels.push_back(channel0);
network_input_channels.push_back(channel1);
network_input_channels.push_back(channel2);
cv::Mat network_input(channel0.size(), CV_32FC3);
cv::merge(network_input_channels, network_input);
The input is converted to a tensor:
std::vector<int64_t> sizes = {1, 3, network_input.rows, network_input.cols};
at::TensorOptions options(at::kFloat);
at::Tensor input_tensor = torch::from_blob(network_input.data, at::IntList(sizes), options);
std::vector<torch::jit::IValue> inputs;
inputs.push_back(input_tensor);
When I examine the values of the input tensor via std::cout
and compare the values with the expected values for my data from the channels above, the values are different. Therefore, I suspect that the error has already happened until this point. However, I cannot rule out that the data is just interpretetet differently by the tensor and everything is still fine. Therefore, for the sake of completeness, the next steps in the pipeline.
Forward pass
auto output_tensor = network->forward(inputs).toTensor();
Converting data back to OpenCV data structures
cv::Mat output_tmp(cv::Size(width, height) , CV_32FC1, output_tensor.data<float>());
cv::Mat network_output(output_tmp.size(), CV_8UC1);
float scale_factor_up = 255.0;
output_tmp.convertTo(network_output, CV_8U, scale_factor_up);
As expected, the values in the output do not make sense. Most of the output is saturated (maxed out or zero). I blame the wrong input for that and not the model because in Python I get fine results.
If anybody could give me any hints I would be very thankful for your help.
Best,
Thorsten