CUDA not available after following DCGAN tutorial?

Spas · June 23, 2019, 10:34am

Hey, I am really impressed with the intuitiveness of Pytorch’s both Python and C++ apis and want to use it at work where we mainly do C++ development, but I am struggling to get the GAN demo going because of a weird issue.

.Setup : FastAI Paperspace Ubuntu instance with all the latest version of pytorch.
When I open up the Python interperter and run torch.cuda.is_available() -> TRUE
When I do the same thing from a C++ (exactly following this tutorial https://github.com/pytorch/examples/tree/master/cpp/dcgan) I get FALSE for toch::cuda::is_available() :?

my CMakeLists.txt looks like this :

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(train-gan)

find_package(Torch REQUIRED)
find_package(CUDA 10.1 REQUIRED)

add_executable(train-gan train_gan.cpp dis.cpp gen.cpp)

target_link_libraries(train-gan "${TORCH_LIBRARIES}")
target_link_libraries(train-gan "${CUDA_LIBRARIES}")

set_property(TARGET train-gan PROPERTY CXX_STANDARD 11)

the full project you can find on my github -> https://github.com/skalaydzhiyski/cpp-gan

I am new to cmake (as probably visible from the repo) and just want to get torch to use GPU in C++.

Let me know if you need any more information and thanks in advance.

ptrblck · June 23, 2019, 11:57am

Are you able to run the DCGAN example on the GPU or is it using your CPU?

I just cloned your repo and tried to run it.
It looks like my GPU was detected:

Number of colour channels: 4
Running on device: cuda

However, I get an error after these lines:

terminate called after throwing an instance of 'c10::Error'
  what():  Error opening images file at ./mnist/train-images-idx3-ubyte (read_images at /pytorch/torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fa06a35bb91 in /home/pbialecki/libs/libtorch_nightly/libtorch/lib/libc10.so)

Spas · June 23, 2019, 12:03pm

Hey, thank you so much for the quick reply I have literally been biting my nails

I always get “Running on device: cpu” and that is on a machine that is giving me TRUE when I run cuda.is_available() from python interpreter AND also is recognized by running nvidia-smi. It seems I have everything setup but the application just doesn’t pick up that I have cuda installed :?

Let me know if you need any screenshots/info/output whatever… I have been struggling to get this running for a week now.

And thanks again fo course.

Spas · June 23, 2019, 12:36pm

Can you please share your setup … libraries/install directories/versions anything.

I don’t care to run the DCGAN demo per se, I just need to get C++ Torch Api to find my CUDA. and to run on the gpu.

I have ran multiple Python models on this setup and they all run fine on the gpu with some to no effort from my end.

ptrblck · June 23, 2019, 12:42pm

Sure!

CUDA version: 10.1
CUDA driver: 418.56
TITAN V
nvcc in /usr/local/cuda-10.1/bin/nvcc
libtorch unzipped in a ~/libs

Does cmake find your CUDA install at all?

Spas · June 23, 2019, 12:44pm

Yes, Cmake finds cuda successfully … I think my Cuda driver is 4.10 though, do you think that might be an issue?

Spas · June 23, 2019, 12:58pm

I have Nvidia Quadro P5000 and the driver is 4.10… do you think might have something to do with the issue?

Spas · June 23, 2019, 1:15pm

ptrblck · June 23, 2019, 1:54pm

Might be the reason.
This table gives you the compatible driver versions.
For CUDA10.1, >=418.39 is recommended.

Could you try to update the driver and run the example again?

Spas · June 23, 2019, 2:04pm

:X … I am planning on decomissioning my machine and requesting a new one to set up from scratch. Do you mind sharing the resource from which you setup your environment, because it seems the issue is with the linking on the libraries I think…

Thanks again for your time and for the quick responses I am desperate at this point.

Spas · June 23, 2019, 2:05pm

Spas · June 23, 2019, 2:06pm

Spas · June 23, 2019, 2:07pm

Spas · June 23, 2019, 2:08pm

that is the only code that I run…

ptrblck · June 23, 2019, 2:10pm

Your Python install is not really related to this issue in libtorch.
E.g. I used my base conda environment without PyTorch installed, and could successfully build the C++ example.

Have you built PyTorch from source before?
If so, did you see any issues?

Maybe Peter Goldsborough (one of the PyTorch core devs) will have any idea about this specific issue. CC @goldsborough

Spas · June 25, 2019, 8:44am

Hey @ptrblck,

Sorry to bother you again with this, but I have just got a completely new and clean machine and want to follow the steps of installing

Conda.
Pytorch from source (as you mentioned)

Do you mind providing me with resource links where I can follow the procedure you did, since I can find 50 different ones online and am nos sure which one of them will work out.

Thanks in advance.

ptrblck · June 25, 2019, 10:46am

I’m not sure, if I’m the right person to ask as I’m installing everything in a pragmatic (and thus maybe not the best?) way.
Anyway,

download the latest Conda package link (Use Python 3.7)
Select the NVIDIA driver from “Software & Updates -> Additional Driver” on Ubuntu (I’m using 418)
Download and install CUDA

Let me know, if you get stuck somewhere.

Spas · June 25, 2019, 9:00pm

Hey,

I have followed EXACTLY what you said but still I don’t get the “cuda” device showing up whilst running my code…

Should I build pytorch from source ?
And what environment variables should I set to make it work ?

Thanks again for your time and for your patience.

ptrblck · June 25, 2019, 9:03pm

Did you install some PyTorch binaries and are you able to create CUDATensors using Python?

Spas · June 25, 2019, 9:06pm

Yes I can create and run anything I want with through Python and it is working like charm, but when I run the example GAN C++ project I get false for cuda::is_available()…

That is so bizarre I have tried multiple environments with multiple versions of Cuda / Cudnn / Torch on multiple machines - they ALL work fine with Python installs and not with C++…