Having a problem working with C++ API and CUDA

Hi everyone, I have an problem and i cannot solve whatever i did. I exported my model in Python to C++ and I can use it properly with the CPU, but when I try to use it with CUDA I get an error.

std::string model_path = "/home/bzeren/leo/Super-Point-Glue/SuperGlueCpp/superglue_outdoor.pt";
torch::jit::script::Module module;
module = torch::jit::load(model_path);

at::Tensor image0 = torch::rand({1, 1, 480, 640}).to(at::kCUDA);
at::Tensor image1 = torch::rand({1, 1, 480, 640}).to(at::kCUDA);
at::Tensor keypoints0 = torch::rand({1, 1000, 2}).to(at::kCUDA);
at::Tensor keypoints1 = torch::rand({1, 1000, 2}).to(at::kCUDA);
at::Tensor scores0 = torch::rand({1, 1000}).to(at::kCUDA);
at::Tensor scores1 = torch::rand({1, 1000}).to(at::kCUDA);
at::Tensor descriptors0 = torch::rand({1, 256, 1000}).to(at::kCUDA);
at::Tensor descriptors1 = torch::rand({1, 256, 1000}).to(at::kCUDA);

auto input = torch::Dict<std::string, at::Tensor>();
input.insert("image0", image0);
input.insert("image1", image1);
input.insert("keypoints0", keypoints0);
input.insert("keypoints1", keypoints1);
input.insert("scores0", scores0);
input.insert("scores1", scores1);
input.insert("descriptors0", descriptors0);
input.insert("descriptors1", descriptors1);

std::vector<torch::jit::IValue> inputs;

auto output = module.forward(inputs).toGenericDict();
terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/superglue.py", line 35, in forward
    _6 = torch.slice(center, 0, 0, 9223372036854775807)
    _7 = torch.slice(torch.unsqueeze(_6, 1), 2, 0, 9223372036854775807)
    _8 = torch.sub(kpts0, _7)
         ~~~~~~~~~ <--- HERE
    _9 = torch.slice(scaling, 0, 0, 9223372036854775807)
    _10 = torch.slice(torch.unsqueeze(_9, 1), 2, 0, 9223372036854775807)

Traceback of TorchScript, original code (most recent call last):
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/models/superglue.py(72): normalize_keypoints
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/models/superglue.py(245): forward
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1118): _slow_forward
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1130): _call_impl
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/venv/lib/python3.8/site-packages/torch/jit/_trace.py(967): trace_module
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/venv/lib/python3.8/site-packages/torch/jit/_trace.py(750): trace
/home/bzeren/leo/Super-Point-Glue/SuperGluePretrainedNetwork/pytorch2libtorch.py(44): <module>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I will be glad if you help. thanks.

You are trying to run your code on a CPU, but your model is trained on a GPU. You need to either train your model on a CPU or run your code on a GPU.

1 Like

Actually, my model is a pre-trained model and when I use it with Python, I can use it on both CPU and CUDA with no problem, so i think the problem might be something else. Any ideas if there are other situations when using it with C++?

Finally, I updated the code part in the archive with .pt extension that I obtained with the torch.jit.trace function and I replaced all to(‘cpu’) with to(‘cuda’). It’s a little bit bad solution but it works. I think this problem will also be solved by editing the model class.