Not detecting GPU RTX 4000

ricksant2003 · August 7, 2023, 12:10pm

Hello
I am trying to install pytorch in Ubunut Mint 21 and use it with RTX 4000. First I’ve installed all drivers and cuda (from cuda_12.2.1_535.86.10_linux.run). Here are some outputs (local user - not root):

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1143 G /usr/lib/xorg/Xorg 97MiB |
±--------------------------------------------------------------------------------------+

$ /usr/local/cuda/extras/demo_suite/deviceQuery
Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “NVIDIA RTX A4000”
CUDA Driver Version / Runtime Version 12.2 / 12.2
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 16101 MBytes (16882663424 bytes)
(48) Multiprocessors, (128) CUDA Cores/MP: 6144 CUDA Cores
GPU Max Clock rate: 1560 MHz (1.56 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.2, NumDevs = 1, Device0 = NVIDIA RTX A4000
Result = PASS

So, its seems for me drivers from nvidia are ok and cuda installation is ol
I’ve installed Anaconda and then I ’ ve created a dl_pytorch environment (with all anaconda packages). To install pytorch I used the following command:

(dl_pytorch) $ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 cudatoolkit -c pytorch -c nvidia

(I am not sure I have to install cudatoolkit, so i 've tried to run the above command with and without cudatoolkit. None of them worlked out)
After installation, I tried to run the following command:
(dl_pytorch) $ python -c “import torch; print(torch.cuda.is_available())”
False

I am non-user in all above commands. I am using Ubuntu Mint. Any clues ?

Thanks

ptrblck · August 7, 2023, 12:59pm

Check if you’ve installed the right PyTorch binaries shipping with CUDA by checking torch.version.cuda, which should return the selected version. If it returns None you’ve installed the CPU-only binary.

ricksant2003 · August 7, 2023, 1:08pm

Ok. I am using cpu version. but how can I change it to force it to install gpu. When I use conda search pytorch -c pytorch, I got:

pytorch 2.0.1 py3.11_cpu_0 pytorch
pytorch 2.0.1 py3.11_cuda11.7_cudnn8.5.0_0 pytorch
pytorch 2.0.1 py3.11_cuda11.8_cudnn8.7.0_0 pytorch

so, I think i have to install pytorch=2.0.1=py3.11_cuda11.8_cudnn8.7.0_0
is it correct ?

BTW, Now I was wondering: do I need to install cudnn ?? because I did not install it (just cuda).
thanks in advance

ptrblck · August 7, 2023, 1:15pm

No, neither your locally installed CUDA toolkit nor cuDNN will be used as PyTorch ships with its own CUDA dependencies.
Use the posted install commands from the install matrix and it should work.

ricksant2003 · August 7, 2023, 1:42pm

$ conda install pytorch=2.0.1=py3.11_cuda11.8_cudnn8.7.0_0 torchvision torchaudio pytorch-cuda=11.8 cudatoolkit -c pytorch -c nvidia

it worked perfectly ! thanks