Cannot use CUDA in libtorch after update to 1.8.0

Ive built a library that calls some libtorch functions from c#. In the wait for v1.8 ive been using the nighlty build and it has worked fine. However, now after upgrading to 1.8.0 all cuda function unittests failed with the message shown below. I made sure to have the latest NVidia driver and updated my CUDA SDK to 11.1.

Does anyone know what has changed or seen this problem?


Message:
Test method UnitTestCSTorch.UnitTest_CompareOperators.TestTensor_Eq_CUDA threw exception:
System.Exception: Could not run ‘aten::empty.memory_format’ with arguments from the ‘CUDA’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. ‘aten::empty.memory_format’ is only available for these backends: [CPU, MkldnnCPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

    CPU: registered at aten\src\ATen\RegisterCPU.cpp:5925 [kernel]
    MkldnnCPU: registered at aten\src\ATen\RegisterMkldnnCPU.cpp:284 [kernel]
    SparseCPU: registered at aten\src\ATen\RegisterSparseCPU.cpp:557 [kernel]
    BackendSelect: registered at aten\src\ATen\RegisterBackendSelect.cpp:596 [kernel]
    Named: registered at ..\..\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
    AutogradOther: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradCPU: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradCUDA: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradXLA: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradNestedTensor: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradPrivateUse1: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradPrivateUse2: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    AutogradPrivateUse3: registered at ..\..\torch\csrc\autograd\generated\VariableType_4.cpp:8707 [autograd kernel]
    Tracer: registered at ..\..\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at ..\..\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at ..\..\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: fallthrough registered at ..\..\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

Hi,

Thanks for reporting this, it looks very similar to libtorch 1.8.0 with CUDA 11.1: CUDA error: no kernel image is available for execution.. · Issue #53476 · pytorch/pytorch · GitHub
Can you confirm if it is the same or not?

Can you please specify which package you’ve downloaded and what GPUs do you have locally.
I wonder if this is similar to libtorch 1.8.0 with CUDA 11.1: CUDA error: no kernel image is available for execution.. · Issue #53476 · pytorch/pytorch · GitHub

I saw that problem on Git and left a comment that it seemed very similar. I cannot say if it is the same but probably it is a good lead. I also downloaded the version using CUDA 10.2 and that works fine. The nighly build have the same problem for CUDA 11.1 but works just fine for CUDA 10.2. So the same pattern there. The problem is very consistent that no CUDA functions work while all CPU functions do. (Except that some FFTs seem to be broken in release mode but not Debug mode - I will leave a report on that after doing some investigation.)

I have previously tested the nighly build for 1.8 (downloaded in december with CUDA 11.0) and that worked fine (except the FFTs).

Im using a Win10 PC. With a GTX 1650 Ti card 4GB. I have tested 2 different configurations:

CUDA 11.1 - Not working
CUDA SDK 11.1 - Downloaded CUDA Toolkit 11.1 Update 1 Downloads | NVIDIA Developer
CUDNN - Download cuDNN v8.1.0 (January 26th, 2021), for CUDA 11.0,11.1 and 11.2
LibTorch release: https://download.pytorch.org/libtorch/cu111/libtorch-win-shared-with-deps-1.8.0%2Bcu111.zip
Libtorch Debug: https://download.pytorch.org/libtorch/cu111/libtorch-win-shared-with-deps-debug-1.8.0%2Bcu111.zip

CUDA 10.2 - working
CUDA SDK 10.2: CUDA Toolkit 10.2 Download | NVIDIA Developer
CUDNN: Download cuDNN v8.0.5 (November 9th, 2020), for CUDA 10.2
LibTorch release: https://download.pytorch.org/libtorch/cu102/libtorch-win-shared-with-deps-1.8.0.zip
Libtorch debug: https://download.pytorch.org/libtorch/cu102/libtorch-win-shared-with-deps-debug-1.8.0.zip

I have the same problem.
CUDA version is 11.1, and I can only use libtorch 1.10.2 +cu102 so I can use torch.device_count() to detect GPU, higher verison like 1.11.0 and lower verison like 1.9.0 can’t work.