Debugging runtime error module->forward(inputs) libtorch 1.4

I have a question related to this project https://github.com/NathanUA/U-2-Net/blob/7e5ff7d4c3becfefbb6e3d55916f48c7f7f5858d/u2net_test.py#L104

I can trace the net like this:

traced_script_module = torch.jit.trace(net, inputs_test)
traced_script_module.save("traced_model.pt")
print(inputs_test.size()) # shows (1, 3, 320, 320)

Now I’m trying to run the model in a C++ application. I was able to do this in a prior project https://github.com/DBraun/PyTorchTOP-cpumem I used CMake and built in debug mode by doing
SET DEBUG=1 before the CMake instructions.

In the C++ project for U-2-Net, I can load the model into a module with no errors. When I call

torchinputs.clear();
torchinputs.push_back(torch::ones({1, 3, 320, 320 }, torch::kCUDA).to(at::kFloat));
module.forward(torchinputs); // error

I get

Unhandled exception at 0x00007FFFD8FFA799 in TouchDesigner.exe: Microsoft C++ exception: std::runtime_error at memory location 0x000000EA677F1B30. occurred

The error is at https://github.com/pytorch/pytorch/blob/4c0bf93a0e61c32fd0432d8e9b6deb302ca90f1e/torch/csrc/jit/api/module.h#L112 It says inputs has size 0. However, I’m pretty sure I’ve passed non-empty data (1, 3, 320,320) to module->forward() https://github.com/DBraun/PyTorchTOP-cpumem/blob/f7cd16cb84021a7fc3681cad3a66c2bd7551a572/src/PyTorchTOP.cpp#L294

This is the stack trace at module->forward(torchinputs)

I thought it might be a DLL issue but I’ve copied all DLLs from libtorch/lib

I can confirm GPU stuff is available and that when I traced the module I was using CUDA.

LoadLibraryA("c10_cuda.dll");
LoadLibraryA("torch_cuda.dll");

try {
	std::cout << "CUDA:   " << torch::cuda::is_available() << std::endl;
	std::cout << "CUDNN:  " << torch::cuda::cudnn_is_available() << std::endl;
	std::cout << "GPU(s): " << torch::cuda::device_count() << std::endl;
}
catch (std::exception& ex) {
	std::cout << ex.what() << std::endl;
}

Trying to fix the runtime exception on module->forward, I thought maybe @torch.jit.script needed to be in some of the functions in the U-2-Net project like here https://github.com/NathanUA/U-2-Net/blob/7e5ff7d4c3becfefbb6e3d55916f48c7f7f5858d/model/u2net.py#L24 I was worried about calling shape[2:] in a function without the @torch.jit.script Should I not be worried?

Any advice is appreciated!

I’ve also followed all the instructions here An unhandled exceptionMicrosoft C ++ exception: c10 :: Error at memory location

Have you moved your model to CUDA? The model will be on CPU by default if you call torch::jit::load.

Thanks for your suggestion. I tried

module = torch::jit::load("traced_model.pt", torch::kCUDA);
module.to(torch::kCUDA);

but got the same results. I have the debug dlls and library etc, perfectly ready for some more debugging. Anything more I can do to help?

I’m stepping through line-by-line. I noticed that the module.forward() call takes about 18 seconds before the exception and this happens even when I know I’m giving it a wrongly sized Tensor:

torchinputs.push_back(torch::ones({1, 1, 1, 1}, torch::kCUDA).to(torch::kFloat)); // intentionally wrong size
module.forward(torchinputs);

If I change everything in my code to cpu, it doesn’t throw a runtime error. So I must be not succeeding in making sure everything is CUDA. I also tried following everything here https://github.com/pytorch/pytorch/issues/19302

Why isn’t this sufficient for having everything in CUDA?

auto module = torch::jit::load("traced_model.pt", torch::kCUDA);
for (auto p : module.parameters()) {
	std::cout << p.device() << std::endl; // cuda:0
}

auto finalinput = torch::ones({ 1, 3, 320, 320 }, torch::TensorOptions().dtype(torch::kFloat).device(torch::kCUDA));
std::cout << "finalinput device: " << finalinput.device() << std::endl; // cuda:0
torchinputs.push_back(finalinput);
auto forward_result = module.forward(torchinputs); // std::runtime_error

^ and changing merely both of those two references to kCPU instead of kCUDA doesn’t throw an error.

1 Like

I read everything here https://pytorch.org/tutorials/advanced/cpp_export.html and tried at::kCUDA instead of torch::kCUDA. I tried the nightly debug 1.5 libtorch but encountered other problems that I couldn’t solve, so I need to stick with 1.4 for now.

same issue with you. I use unet for inference.(libtorch1.5.0 cuda9.2)