Why same model in CUDA and CPU got different result?

NickKao · September 18, 2019, 7:28am

I tested a model with different results in the case of CUDA and CPU.
The condition of the CPU can produce results normally.
But in the case of CUDA, I have to convert the output to the CPU before I can execute it, but the result of the execution is incorrect.

		torch::Tensor score;
		if (isCUDA) {
			torch::Tensor new_out_tensor = out_tensor.to(torch::kCPU).detach();
			score = new_out_tensor.squeeze(0);// [288, 384, 2]
		}
		else {
			score = out_tensor.squeeze(0).detach();// [288, 384, 2]
		}

If there is no conversion to the CPU, a memory exception will occur.

Can I do something wrong there?

NickKao · September 18, 2019, 8:46am

CPU result:

CUDA result:

albanD · September 18, 2019, 2:17pm

Hi,

Why do you detach() when you compute your scores? That would prevent gradients from flowing if you use this during training.

NickKao · September 19, 2019, 12:39am

Haha!
Actually, I don’t know why I want to use it. It’s just from the Internet.
Because I am not using it for training, I just use it to predict, so the result of the implementation is no problem.
The part with detach() is that the CPU can execute normally without problems.

albanD · September 19, 2019, 1:17am

If you just use this for evaluation, you should use the NoGradGuard (not sure about the exact name, but something along those lines) to completely disable the autograd (that will make you code faster and use less memory) !

I guess there is something wrong in the way you send the model on the gpu. Are you sure that the weights / inputs to the network are properly sent to the gpu?

NickKao · September 19, 2019, 1:31am

This is load model code:

void CRAFT::loadModel(string& model_path,bool& isRuntimeIn) {
	isRuntime = isRuntimeIn;
	//at::init_num_threads();
	if (isCUDA) {
		module = torch::jit::load(model_path, torch::kCUDA);
	}
	else {
		module = torch::jit::load(model_path);
	}

	assert(module != nullptr);
	std::cout << "ok\n";
}

As you can see, I switched the CUDA to the CPU in the same program, so all the method and process is the same, I don’t know why the result is different?

NickKao · September 19, 2019, 1:56am

Added NoGradGuard then same CUDA result is different.

torch::NoGradGuard no_grad_guard;

My forward code:

	if (isCUDA) {
		output = module.forward({ tensor_image.to(torch::kCUDA)});
	}
	else {
		output = module.forward({ tensor_image });
	}

albanD · September 19, 2019, 1:48pm

You should be able to print to stdout the values of the different tensors (both cpu and gpu), can you print the inputs, weights and outputs to see where the difference appears?

Nick_Kao · September 20, 2019, 2:27am

謝謝。Thanks.
I will back next Thus.
Then continues to discuss…

NickKao · September 24, 2019, 2:52am

I try to write output to file.
Store on my google drive.
link to:
https://drive.google.com/drive/folders/1vK3ixC792Kkv-OOD0ckX1yF_pzoDprJ1?usp=sharing
GPU and CPU got different result. Really…

albanD · September 24, 2019, 4:28am

As I said above, you should inspect when the discrepancy appears. It is when you load the weights? Or on the inputs or when you apply the forward pass?

NickKao · September 24, 2019, 6:44am

Ok, I will try it out.
But I want to put my model and code on github, if you can, please try it out (please pay attention to the operation instructions, choose 1 for CUDA, 2 for CPU).

You should know it, it seems very strange.

Because the model is not small, please note that if you use the GPU, the GPU RAM should not be too small, otherwise it will be destroyed.

laomaup · September 30, 2019, 8:15am

Hello, the classifier trained by Resnet50 works well in python. The same test data has poor classification accuracy in C++（libtorch） calls. I don’t know why. Have you met it? thank you！

NickKao · October 1, 2019, 2:48am

Haha . not yet.
May be later…
Hope libtorch get better…

ptrblck · October 1, 2019, 10:06am

Did you make sure to use the same preprocessing etc.?
I would recommend to check the inputs first and get the same values in the Python API and libtorch and then try to narrow down the discrepancy.

NickKao · October 1, 2019, 10:19am

I have upload my code and model here:

that is in same process in one code just switch gpu and cpu.

martinr · October 2, 2019, 2:11pm

You are not the first one to have such problem. I have a similar one here and there is another unanswered one here.

I suggest you try to locate the source of divergence by yourself first, it makes it easier to help you. I had a Python code, not C++, but I’ll share it here so you get the idea on how to locate the problem.

Save off the intermediate variables on CPU and GPU inference:
torch.save(variable, "/path/to/varfile")
then afterwards load both for analysis:

cpuvar = torch.load("/path/to/varfile_cpu", map_location="cpu")
gpuvar = torch.load("/path/to/varfile_gpu", map_location="cpu")

compare:

close = torch.isclose(cpuvar, gpuvar, rtol=1e-04, atol=1e-04)
print("SIMILAR", close[close==True].shape)
print("FAR", close[close==False].shape)

Perfect case is where CPU and GPU will have similar results for the same input. Compare all variables until you will find the divergence.

NickKao · October 3, 2019, 10:15am

Thanks.
I will try it.
But It’s very strange, I can use python pytorch on the GPU (cuda 10) to perform normally.

laomaup · October 8, 2019, 4:11am

Thank you, I will try it first.

FantasyJXF · March 4, 2020, 10:27am

Why CRAFT behaves so strange, I didn’t met this problem with other models, Have you figured it out now?