Torch::cuda::cudnn_is_available() got false

NickKao · September 16, 2019, 7:43am

when test in python got true.

>>> torch.cuda.is_available()
True

but in libtorch always got false.

int main(int argc, const char* argv[]) {

	torch::DeviceType device_type;

	if (torch::cuda::cudnn_is_available()) {
		std::cout << "cudnn_is_available" << std::endl;
	}

libtorch is cuda 10.0 version in windows.
What is wrong ?

NickKao · September 16, 2019, 7:47am

My Cmake Message:

D:\WS\WS_CPP\torch_begin\libtorch-gpu\build>cmake -G "Visual Studio 14 2015 Win64" ../
-- Selecting Windows SDK version  to target Windows 10.0.15063.
-- The C compiler identification is MSVC 19.0.24215.1
-- The CXX compiler identification is MSVC 19.0.24215.1
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
D:/WS/WS_CPP/torch_begin/libtorch-gpu
D:/WS/WS_CPP/torch_begin/libtorch-gpu
-- OpenCV ARCH: x64
-- OpenCV RUNTIME: vc14
-- OpenCV STATIC: OFF
-- Found OpenCV: D:/dev-soft/c_include/opencv4.0.1/build (found version "4.0.1")
-- Found OpenCV 4.0.1 in D:/dev-soft/c_include/opencv4.0.1/build/x64/vc14/lib
-- You might need to add D:\dev-soft\c_include\opencv4.0.1\build\x64\vc14\bin to your PATH to be able to run your applications.
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0 (found version "10.0")
-- Caffe2: CUDA detected: 10.0
-- Caffe2: CUDA nvcc is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/bin/nvcc.exe
-- Caffe2: CUDA toolkit directory: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0
-- Caffe2: Header version is: 10.0
-- Found CUDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/include
-- Found cuDNN: v7.6.3  (include: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/include, library: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cudnn.lib)
CMake Warning (dev) at D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Caffe2/public/cuda.cmake:377 (if):
  Policy CMP0054 is not set: Only interpret if() arguments as variables or
  keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.

  Quoted variables like "MSVC" will no longer be dereferenced when the policy
  is set to NEW.  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
  D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
  CMakeLists.txt:20 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Autodetected CUDA architecture(s):  6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
CMake Warning (dev) at D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Caffe2/public/utils.cmake:57 (if):
  Policy CMP0054 is not set: Only interpret if() arguments as variables or
  keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.

  Quoted variables like "MSVC" will no longer be dereferenced when the policy
  is set to NEW.  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Caffe2/Caffe2Config.cmake:121 (caffe2_interface_library)
  D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Torch/TorchConfig.cmake:40 (find_package)
  CMakeLists.txt:20 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at D:/dev-soft/libtorch1.2.0_GPU/share/cmake/Torch/TorchConfig.cmake:90 (if):
  Policy CMP0054 is not set: Only interpret if() arguments as variables or
  keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.

  Quoted variables like "MSVC" will no longer be dereferenced when the policy
  is set to NEW.  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  CMakeLists.txt:20 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found torch: D:/dev-soft/libtorch1.2.0_GPU/lib/torch.lib
-- Pytorch status:
--     libraries: torch;torch_library;D:/dev-soft/libtorch1.2.0_GPU/lib/c10.lib;C:/Program Files/NVIDIA Corporation/NvToolsExt/lib/x64/nvToolsExt64_1.lib;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cudart_static.lib;D:/dev-soft/libtorch1.2.0_GPU/lib/caffe2_nvrtc.lib;D:/dev-soft/libtorch1.2.0_GPU/lib/c10_cuda.lib
-- OpenCV library status:
--     version: 4.0.1
--     libraries: opencv_calib3d;opencv_core;opencv_dnn;opencv_features2d;opencv_flann;opencv_gapi;opencv_highgui;opencv_imgcodecs;opencv_imgproc;opencv_ml;opencv_objdetect;opencv_photo;opencv_stitching;opencv_video;opencv_videoio;opencv_world
--     include path: D:/dev-soft/c_include/opencv4.0.1/build/include
-- Found JNI: C:/Program Files/Java/jdk1.8.0_131/lib/jawt.lib
-- JNI status:
--     libraries: C:/Program Files/Java/jdk1.8.0_131/include
-- Configuring done
-- Generating done
-- Build files have been written to: D:/WS/WS_CPP/torch_begin/libtorch-gpu/build

NickKao · September 16, 2019, 9:15am

That is my stupid fault.
My env. path point to libtorch_cpu\lib.
Change to libtorch_gpu\lib is ok.

bfortuner · October 13, 2019, 4:20am

I have the same issue on Ubuntu 14.01 (able to train on GPU with Python API, but can’t in c++ API). I’m following the tutorial here: https://pytorch.org/tutorials/advanced/cpp_export.html. Any idea what I’m doing wrong?

Ubuntu: 14.01
gcc: 4.8.4
ldd: 2.19
Python: 3.6
Cuda: 10.0
Nvidia Driver: 410.78
Pytorch: https://download.pytorch.org/whl/cu100/torch-1.3.0%2Bcu100-cp36-cp36m-linux_x86_64.whl
Libtorch: https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.3.0.zip

When I run:

cmake -DCMAKE_PREFIX_PATH=/home/bfortuner/libtorch ..

-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
...

-- Found CUDA: /usr/local/cuda-10.0 (found version "10.0") 
-- Caffe2: CUDA detected: 10.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda-10.0/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-10.0
-- Caffe2: Header version is: 10.0
-- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so  
-- Found cuDNN: v7.4.1 (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Autodetected CUDA architecture(s): 6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
-- Found torch: /home/bfortuner/libtorch/lib/libtorch.so

But when try to load the model, it says:
CUDA driver version is insufficient for CUDA runtime version

But my CUDA/driver are compatible when I use the python API?

If I skip the model load and run torch::cuda::is_available(), I get false.

I’m sure there’s something simple I’m missing?

bfortuner · October 13, 2019, 5:19am

I think it has to do with a version mismatch with libtorch and pytorch (where I export the traced model), but I’m not sure how to tell which libtorch/pytorch version combo I need to support Cuda 10.0. I think this because it says GPUs are available when I download this other version of libtorch, but I’m unable to load my model.

https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-latest.zip

CUDA is available! Training on GPU.
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at inline_container.cc:137] . PytorchStreamReader failed closing reader: file not found
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7f92f905de17 in /home/bfortuner/libtorch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*) + 0x6b (0x7f92fbdbb4cb in /home/bfortuner/libtorch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::~PyTorchStreamReader() + 0x1f (0x7f92fbdbb51f in /home/bfortuner/libtorch/lib/libtorch.so)