Why same model in CUDA and CPU got different result?

Haha!
Actually, I don’t know why I want to use it. It’s just from the Internet.
Because I am not using it for training, I just use it to predict, so the result of the implementation is no problem.
The part with detach() is that the CPU can execute normally without problems.

1 Like

If you just use this for evaluation, you should use the NoGradGuard (not sure about the exact name, but something along those lines) to completely disable the autograd (that will make you code faster and use less memory) !

I guess there is something wrong in the way you send the model on the gpu. Are you sure that the weights / inputs to the network are properly sent to the gpu?

1 Like

This is load model code:

void CRAFT::loadModel(string& model_path,bool& isRuntimeIn) {
	isRuntime = isRuntimeIn;
	//at::init_num_threads();
	if (isCUDA) {
		module = torch::jit::load(model_path, torch::kCUDA);
	}
	else {
		module = torch::jit::load(model_path);
	}

	assert(module != nullptr);
	std::cout << "ok\n";
}

As you can see, I switched the CUDA to the CPU in the same program, so all the method and process is the same, I don’t know why the result is different?

1 Like

Added NoGradGuard then same CUDA result is different.

torch::NoGradGuard no_grad_guard;

My forward code:

	if (isCUDA) {
		output = module.forward({ tensor_image.to(torch::kCUDA)});
	}
	else {
		output = module.forward({ tensor_image });
	}
1 Like

You should be able to print to stdout the values of the different tensors (both cpu and gpu), can you print the inputs, weights and outputs to see where the difference appears?

1 Like

謝謝。Thanks.
I will back next Thus.
Then continues to discuss…

1 Like

I try to write output to file.
Store on my google drive.
link to:
https://drive.google.com/drive/folders/1vK3ixC792Kkv-OOD0ckX1yF_pzoDprJ1?usp=sharing
GPU and CPU got different result. Really…

1 Like

As I said above, you should inspect when the discrepancy appears. It is when you load the weights? Or on the inputs or when you apply the forward pass?

1 Like

Ok, I will try it out.
But I want to put my model and code on github, if you can, please try it out (please pay attention to the operation instructions, choose 1 for CUDA, 2 for CPU).


You should know it, it seems very strange.

Because the model is not small, please note that if you use the GPU, the GPU RAM should not be too small, otherwise it will be destroyed.

1 Like

Hello, the classifier trained by Resnet50 works well in python. The same test data has poor classification accuracy in C++(libtorch) calls. I don’t know why. Have you met it? thank you!

Haha . not yet.
May be later…
Hope libtorch get better…

Did you make sure to use the same preprocessing etc.?
I would recommend to check the inputs first and get the same values in the Python API and libtorch and then try to narrow down the discrepancy.

I have upload my code and model here:

that is in same process in one code just switch gpu and cpu.

You are not the first one to have such problem. I have a similar one here and there is another unanswered one here.

I suggest you try to locate the source of divergence by yourself first, it makes it easier to help you. I had a Python code, not C++, but I’ll share it here so you get the idea on how to locate the problem.

  1. Save off the intermediate variables on CPU and GPU inference:
    torch.save(variable, "/path/to/varfile")
  2. then afterwards load both for analysis:
cpuvar = torch.load("/path/to/varfile_cpu", map_location="cpu")
gpuvar = torch.load("/path/to/varfile_gpu", map_location="cpu")
  1. compare:
close = torch.isclose(cpuvar, gpuvar, rtol=1e-04, atol=1e-04)
print("SIMILAR", close[close==True].shape)
print("FAR", close[close==False].shape)

Perfect case is where CPU and GPU will have similar results for the same input. Compare all variables until you will find the divergence.

Thanks.
I will try it.
But It’s very strange, I can use python pytorch on the GPU (cuda 10) to perform normally.

Thank you, I will try it first.

Why CRAFT behaves so strange, I didn’t met this problem with other models, Have you figured it out now?

HaHa…
I had give up Libtorch.
For speed now switch to TensorRT.
But hit another question.
pytorch to onnx
Life is so many mountains to climb…

tensorRT is another thing …

It’s amazing that we met the exactly same problem, our code looks like same too.

I’m going to dig into the problem for a while before I give up.

So how’s it going with your CRAFT model using tensorRT?

@NickKao Hi, NickKao. I have solved the problem.

The problem comes from moving torch::Tentor data to cv::Mat with the following line:

std::memcpy((void *) textmap.data, score_text.data_ptr(), torch::elementSize(torch::kFloat) * score_text.numel());

The reason is believed to be contiguous operation within pytorch, the libtorch-cuda maybe has some bug moving data from CUDA to CPU, we could check the model outputs from CPU/CUDA device, they are the same. But if you move tensor data from CPU to cv::Mat with memory copy, it would cause data NOT CONTINUOUS.

As for CRAFT model, just remove the permute operation when you trace the model.

As is known, view, permute, narrow() , expand() and transpose() suffers from contiguous.

Hope this would help.

Anyway, I solve the problem by doing this, NOT SURE whether it’s the hidden reason.