C++ API Exception (CPP/c10::Error) when using nearest_cuda

Hi,
I’m trying to use the C++ API of pytorch_cluster in conjunction with the LibTorch C++ API. I managed to build and install the former WITH_CUDA=ON and tried to reproduce the nearest example below:

#include <torch/torch.h>
#include <torchcluster/cluster.h>

int main() {
    std::cout << torch::cuda::is_available() << std::endl;
    torch::Device device(torch::kCUDA);
    auto options = torch::TensorOptions()
        .dtype(torch::kFloat16)
        .layout(torch::kStrided)
        .device(device)
        .requires_grad(false);
    auto x = torch::tensor({{-1, -1}, {-1, 1}, {1, -1}, {1, 1}}, options);
    auto batch_x = torch::tensor({0, 0, 0, 0}, options);
    auto y = torch::tensor({{-1, 0}, {1, 0}}, options);
    auto batch_y = torch::tensor({0, 0}, options);
    auto cluster = nearest(x, y, batch_x, batch_y);
    std::cout << cluster << std::endl;
    return 0;
}

Reaching the nearest function (which calls nearest_cuda), I get the following exception error:

Exception has occurred: CPP/c10::Error
Unhandled exception at 0x00007FFC29FBCD29 in TorchExample.exe: Microsoft C++ exception: c10::Error at memory location 0x000000125A1DEDC0.

Precisely, the error occurs when invoking AT_DISPATCH_FLOATING_TYPES_AND.

Environment:

  • OS: Windows 10 Pro
  • Torch C++ installation: libtorch-win-shared-with-deps-1.13.1+cu117
  • Compiler: Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30147 for x64
  • Build variant: RelWithDebInfo
  • Torch Cluster built from 1.6.1 stable release

I’m sure that the torch_cluster build links correctly to the LibTorch. I’ve already, successfully, used LibTorch alone with CUDA (including autograd functionalities), but cannot tell weather this is fully related to torch_cluster.

Any thoughts?

Thank you in advance

Based on the error message these errors could be raised if some objects were not moved to the GPU as described e.g. here. However, the actual error message in Windows doesn’t point to the failing operation. Are you able to reproduce the issue on Linux?

Thank you for the quick reply.
All 4 variables are in GPU (I checked them with is_cuda()). Unfortunately I don’t have Linux to test!

Another input that could help, I was obliged to copy-paste the torchcluster.dll from the install directory to the target application directory. Don’t know if this breaks any implicite mechanisms but I’m already successfully doing so for the torch library (based on the torch example).

Are you using this section directly to reproduce the issue on Windows?
If so, I could try to run it on Linux assuming I can figure out all dependencies.

Except for exporting the python path to Torch_DIR, yes. As a matter of fact, since I’m using the LibTorch C++ installation, the Torch_DIR links to <some-root-path>/libtorch-win-shared-with-deps-1.13.1+cu117/libtorch/share/cmake/Torch. I’m currently trying to reproduce it using conda/python installation.