How to get debug info from `torch.cuda.is_available()` about missing libs


There are already a ton of posts on this topic of " torch is not recognizing GPU backend".
Each of the solutions mentioned is per case (such as upgrade or downgrade cuda or driver to some magic number for a specific torch version; see 1, 2, 3, 4, 5, 6 …), and none of those answers seem generic enough to address most people, so they keep coming with new versions of torch.

So, here is what I learned


$ python -c 'import torch; print(torch.cuda.is_available())'

But why? I got nvidia-smi showing GPU correctly.

One sure-shot way of fixing my cuda lib compatibility problems (not the desirable, though!) is asking tensorflow

$ python -c 'import tensorflow as tf; print(tf.test.is_gpu_available())'

2019-11-18 22:56:28.982050: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:00:06.0
2019-11-18 22:56:28.986522: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:28.989327: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:28.992141: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:28.994862: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:28.997474: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:29.000329: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:29.003020: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory
2019-11-18 22:56:29.003104: W tensorflow/core/common_runtime/gpu/] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-11-18 22:56:29.003232: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-18 22:56:29.003284: I tensorflow/core/common_runtime/gpu/]      0
2019-11-18 22:56:29.003372: I tensorflow/core/common_runtime/gpu/] 0:   N

TF precisely states Could not load dynamic library <namehere>, example or '' which is a crucial info to know what cuda libs and versions are missing.

If there is a way to get such debug info from torch? please let me know. (we would be pleased to not depend on TF to fix it). Should the torch.cuda.is_available() have debug=True argument to print which of the missing libraries are causing it to return False?

With this, if we figure out the missing libs and version are:
Could not load dynamic library ''; or ''
We can do

conda install cudnn=7 cudatoolkit=10.0 -c anaconda

then for sure torch recognizes GPU backend:

$ python -c 'import torch; print(torch.cuda.is_available())'

But we have to know first that the missing libs are '' and '' to install cudnn=7 cudatoolkit=10.0 (otherwise it goes to trial-error and magic numbers)

Also, another topic related to getting extra info from debug=True:

$ python -c 'import torch; print(torch.cuda.is_available(), torch.version.cuda)'
False 10.0.130
$ python -c 'import tensorflow as tf; print(tf.test.is_gpu_available())'
E tensorflow/stream_executor/cuda/] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error

I guess cuInit: CUDA_ERROR_UNKNOWN: unknown error means I have to restart the machine. Knowing this errors would help as well.


You can try to actually use a cuda element to see such errors: torch.rand(1, device="cuda").
Does that give you the informations you want?

1 Like

Yes that helps. It prints useful info. Thanks.
Wish it was a documented feature!

My code branching has been

if torch.cuda.is_available():
    use cuda
    use cpu, dont attempt to use of cuda

so I never had chance to see that message