Hi guys, I’ve got a two-GPUs PC and try to run two networks on GPUs parallelly. For this, now when I run one of them, I set torch.cuda.set_device(0)
and torch.cuda.set_device(1)
for another one. It there any functions or orders to judge which GPU is free and select it? Thank you very much~
I’m not aware of a straightforward way, so please correct me, if there is a simpler solution.
For now you could call nvidia-smi
inside your script and based on this output select the GPU you would like to use. The returned memory stats will be wrapped into a pandas.DataFrame
so that you can sort it etc.
Here is a sample code:
import subprocess
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
def get_free_gpu():
gpu_stats = subprocess.check_output(["nvidia-smi", "--format=csv", "--query-gpu=memory.used,memory.free"])
gpu_df = pd.read_csv(StringIO(gpu_stats),
names=['memory.used', 'memory.free'],
skiprows=1)
print('GPU usage:\n{}'.format(gpu_df))
gpu_df['memory.free'] = gpu_df['memory.free'].map(lambda x: x.rstrip(' [MiB]'))
idx = gpu_df['memory.free'].idxmax()
print('Returning GPU{} with {} free MiB'.format(idx, gpu_df.iloc[idx]['memory.free']))
return idx
free_gpu_id = get_free_gpu()
torch.cuda.set_device(free_gpu_id)
Thanks for your reply, but I got such an error when I ran your code:
Traceback (most recent call last):
File "/home/hdl2/Desktop/SonoFetalImage/playground.py", line 21, in <module>
free_gpu_id = get_free_gpu()
File "/home/hdl2/Desktop/SonoFetalImage/playground.py", line 12, in get_free_gpu
gpu_df = pd.read_csv(StringIO(gpu_status),
TypeError: initial_value must be str or None, not bytes
How to solve it?
This error is thrown by StringIO
for example if you included from io import StringIO
while using Python2.
That’s strange, since I built in a version check.
Which Python version are you using?
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
I’m not sure, why this error occurs, but it seems to be related to handling unicode strings.
I tried the code with Python3.6 and it worked by adding the following bit:
gpu_df = pd.read_csv(StringIO(u"".join(gpu_stats)),
names=['memory.used', 'memory.free'],
skiprows=1)
Regardless of these, I seem to find a simpler way, just like that:
def get_freer_gpu():
os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp')
memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
return np.argmax(memory_available)
Nice! Looks indeed a bit shorter
I think you can try this:
import os
def find_gpus(nums=6):
os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp_free_gpus')
with open('tmp_free_gpus', 'r') as lines_txt:
frees = lines_txt.readlines()
idx_freeMemory_pair = [ (idx,int(x.split()[2]))
for idx,x in enumerate(frees) ]
idx_freeMemory_pair.sort(key=lambda my_tuple:my_tuple[1],reverse=True)
usingGPUs = [str(idx_memory_pair[0])
for idx_memory_pair in idx_freeMemory_pair[:nums] ]
usingGPUs = ','.join(usingGPUs)
print('using GPU idx: #', usingGPUs)
return usingGPUs
os.environ['CUDA_VISIBLE_DEVICES'] = find_gpus(nums=4) # 必须在import torch前面
Hi,
Here’s a version which doesn’t write to an intermediate file, and lets you set the vram threshold:
Automatic GPU Allocation (github.com)
This is based on @pen_good and @ptrblck 's versions - thanks!
Alex
Today, I find this command does not give output anymore
‘nvidia-smi -q -d Memory |grep -A4 GPU|grep Free’
import subprocess
def run_cmd(cmd):
out = (subprocess.check_output(cmd, shell=True)).decode('utf-8')[:-1]
return out
def get_free_gpu_indices():
out = run_cmd('nvidia-smi -q -d Memory | grep -A4 GPU')
out = (out.split('\n'))[1:]
out = [l for l in out if '--' not in l]
total_gpu_num = int(len(out)/5)
gpu_bus_ids = []
for i in range(total_gpu_num):
gpu_bus_ids.append([l.strip().split()[1] for l in out[i*5:i*5+1]][0])
out = run_cmd('nvidia-smi --query-compute-apps=gpu_bus_id --format=csv')
gpu_bus_ids_in_use = (out.split('\n'))[1:]
gpu_ids_in_use = []
for bus_id in gpu_bus_ids_in_use:
gpu_ids_in_use.append(gpu_bus_ids.index(bus_id))
return [i for i in range(total_gpu_num) if i not in gpu_ids_in_use]
print(get_free_gpu_indices())
This gives entire list of free GPUs available.
Hello, I was browsing this trying to find the best way to deal with this, since I’m working with a shared workstation that has multiple devices. So I made this package that auto selects the optimal cuda device, based on available memory, utilization, or power usage.
To install:
pip install cuda-selector
To use:
from cuda_selector import auto_cuda
# Select cuda device with most memory available
device = auto_cuda()
# Select cuda device with lowest power usasge usage
device = auto_cuda('power')
# Select cuda device with lowest power usasge usage
device = auto_cuda('utilization')```