Why same model in CUDA and CPU got different result?

HaHa…
I had give up Libtorch.
For speed now switch to TensorRT.
But hit another question.
pytorch to onnx
Life is so many mountains to climb…

tensorRT is another thing …

It’s amazing that we met the exactly same problem, our code looks like same too.

I’m going to dig into the problem for a while before I give up.

So how’s it going with your CRAFT model using tensorRT?

@NickKao Hi, NickKao. I have solved the problem.

The problem comes from moving torch::Tentor data to cv::Mat with the following line:

std::memcpy((void *) textmap.data, score_text.data_ptr(), torch::elementSize(torch::kFloat) * score_text.numel());

The reason is believed to be contiguous operation within pytorch, the libtorch-cuda maybe has some bug moving data from CUDA to CPU, we could check the model outputs from CPU/CUDA device, they are the same. But if you move tensor data from CPU to cv::Mat with memory copy, it would cause data NOT CONTINUOUS.

As for CRAFT model, just remove the permute operation when you trace the model.

As is known, view, permute, narrow() , expand() and transpose() suffers from contiguous.

Hope this would help.

Anyway, I solve the problem by doing this, NOT SURE whether it’s the hidden reason.