I have 3 gpu, why torch.cuda.device_count() only return '1'

foxet · September 10, 2017, 2:42pm

i have 3 gpu(CUDA_VISIBLE_DEVICES=0,1,2),
why torch.cuda.device_count() only return ‘1’

QuantScientist · September 10, 2017, 3:04pm

Run this:
https://github.com/QuantScientist/Deep-Learning-Boot-Camp/blob/master/day%2002%20PyTORCH%20and%20PyCUDA/PyTorch/01%20PyTorch%20GPU%20support%20test.ipynb

Show us the output,

foxet · September 10, 2017, 3:51pm

% reset -f
from __future__ import print_function
from __future__ import division
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import torch
import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
from subprocess import call
# call(["nvcc", "--version"]) does not work
! nvcc --version
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__Devices')
call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
print('Active CUDA Device: GPU', torch.cuda.current_device())

print ('Available devices ', torch.cuda.device_count())
print ('Current cuda device ', torch.cuda.current_device())

__Python VERSION: 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
__pyTorch VERSION: 0.2.0_4
__CUDA VERSION
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
__CUDNN VERSION: 6021
__Number CUDA Devices: 3
__Devices
Active CUDA Device: GPU 0
Available devices  3
Current cuda device  0

use_cuda = torch.cuda.is_available()
FloatTensor = torch.cuda.FloatTensor if use_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if use_cuda else torch.LongTensor
Tensor = FloatTensor

import pycuda
from pycuda import compiler
import pycuda.driver as drv

drv.init()
print("%d device(s) found." % drv.Device.count())
           
for ordinal in range(drv.Device.count()):
    dev = drv.Device(ordinal)
    print (ordinal, dev.name())

3 device(s) found.
0 GeForce GTX 1080 Ti
1 GeForce GTX 1080 Ti
2 GeForce GTX 1080 Ti

from pycuda import gpuarray
from pycuda.curandom import rand as curand
 # -- initialize the device
import pycuda.autoinit

height = 100
width = 200
X = curand((height, width), np.float32)
X.flags.c_contiguous 
print (type(X))

<class 'pycuda.gpuarray.GPUArray'>

torch.cuda.device_count()

foxet · September 10, 2017, 3:53pm

in this file, torch.cuda.device_count() works fine.

seems just the code i worked on can’t return the right number of cuda device. don’t know why

foxet · September 10, 2017, 3:57pm

after i duplicate that file, it works …thankyou…

QuantScientist · September 10, 2017, 7:32pm

My pleasure, let me know if you need anything else

mikey_t · June 18, 2020, 5:30pm

I’m having the same problem and I’m wondering if there have been any updates to make it easier for pytorch to find my gpus. I have two:

Microsoft Remote Display Adapter 0
NVIDIA GeForce RTX 2070 SUPER 4293918720 4095
Microsoft Remote Display Adapter 0
NVIDIA GeForce RTX 2070 SUPER 4293918720 4095

But when I run torch.cuda.device_count(), I get 1. Everythinig runs smoothly on the one gpu, but I’d like to utilize both. I tried following the advice above, but the link to the jupyter notebook appears to be broken and when I run the code pasted in this question, I can’t install the package pycuda. Since this is from three years ago, I’m wondering if there has been an update.

drevicko · July 13, 2020, 1:03am

Do you have CUDA_VISIBLE_DEVICES set in the environment from which you launch your program (or by some script that launches your python program)? That makes other devices disappear from subsequent cuda calls, including from pytorch. This illustrates nicely how CUDA_VISIBLE_DEVICES works.