It there anyway to let program select free GPU automatically?

Hi guys, I’ve got a two-GPUs PC and try to run two networks on GPUs parallelly. For this, now when I run one of them, I set torch.cuda.set_device(0) and torch.cuda.set_device(1) for another one. It there any functions or orders to judge which GPU is free and select it? Thank you very much~

I’m not aware of a straightforward way, so please correct me, if there is a simpler solution.
For now you could call nvidia-smi inside your script and based on this output select the GPU you would like to use. The returned memory stats will be wrapped into a pandas.DataFrame so that you can sort it etc.
Here is a sample code:

import subprocess
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

def get_free_gpu():
    gpu_stats = subprocess.check_output(["nvidia-smi", "--format=csv", "--query-gpu=memory.used,memory.free"])
    gpu_df = pd.read_csv(StringIO(gpu_stats),
                         names=['memory.used', 'memory.free'],
                         skiprows=1)
    print('GPU usage:\n{}'.format(gpu_df))
    gpu_df['memory.free'] = gpu_df['memory.free'].map(lambda x: x.rstrip(' [MiB]'))
    idx = gpu_df['memory.free'].idxmax()
    print('Returning GPU{} with {} free MiB'.format(idx, gpu_df.iloc[idx]['memory.free']))
    return idx

free_gpu_id = get_free_gpu()
torch.cuda.set_device(free_gpu_id)
2 Likes

Thanks for your reply, but I got such an error when I ran your code:

Traceback (most recent call last):
  File "/home/hdl2/Desktop/SonoFetalImage/playground.py", line 21, in <module>
    free_gpu_id = get_free_gpu()
  File "/home/hdl2/Desktop/SonoFetalImage/playground.py", line 12, in get_free_gpu
    gpu_df = pd.read_csv(StringIO(gpu_status),
TypeError: initial_value must be str or None, not bytes

How to solve it?

This error is thrown by StringIO for example if you included from io import StringIO while using Python2.
That’s strange, since I built in a version check.

Which Python version are you using?

Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

I’m not sure, why this error occurs, but it seems to be related to handling unicode strings.

I tried the code with Python3.6 and it worked by adding the following bit:

gpu_df = pd.read_csv(StringIO(u"".join(gpu_stats)),
                         names=['memory.used', 'memory.free'],
                         skiprows=1)

Regardless of these, I seem to find a simpler way, just like that:

def get_freer_gpu():
    os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp')
    memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
    return np.argmax(memory_available)

:smile:

8 Likes

Nice! Looks indeed a bit shorter :wink:

1 Like

I think you can try this:

import os
def find_gpus(nums=6):
    os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp_free_gpus')
    with open('tmp_free_gpus', 'r') as lines_txt:
        frees = lines_txt.readlines()
        idx_freeMemory_pair = [ (idx,int(x.split()[2]))
                              for idx,x in enumerate(frees) ]
    idx_freeMemory_pair.sort(key=lambda my_tuple:my_tuple[1],reverse=True)
    usingGPUs = [str(idx_memory_pair[0])
                    for idx_memory_pair in idx_freeMemory_pair[:nums] ]
    usingGPUs =  ','.join(usingGPUs)
    print('using GPU idx: #', usingGPUs)
    return usingGPUs
os.environ['CUDA_VISIBLE_DEVICES'] = find_gpus(nums=4)  # 必须在import torch前面

Hi,

Here’s a version which doesn’t write to an intermediate file, and lets you set the vram threshold:

Automatic GPU Allocation (github.com)

This is based on @pen_good and @ptrblck 's versions - thanks!

Alex

Today, I find this command does not give output anymore

‘nvidia-smi -q -d Memory |grep -A4 GPU|grep Free’

import subprocess

def run_cmd(cmd):
    out = (subprocess.check_output(cmd, shell=True)).decode('utf-8')[:-1]
    return out

def get_free_gpu_indices():
    out = run_cmd('nvidia-smi -q -d Memory | grep -A4 GPU')
    out = (out.split('\n'))[1:]
    out = [l for l in out if '--' not in l]

    total_gpu_num = int(len(out)/5)
    gpu_bus_ids = []
    for i in range(total_gpu_num):
        gpu_bus_ids.append([l.strip().split()[1] for l in out[i*5:i*5+1]][0])

    out = run_cmd('nvidia-smi --query-compute-apps=gpu_bus_id --format=csv')
    gpu_bus_ids_in_use = (out.split('\n'))[1:]
    gpu_ids_in_use = []

    for bus_id in gpu_bus_ids_in_use:
        gpu_ids_in_use.append(gpu_bus_ids.index(bus_id))

    return [i for i in range(total_gpu_num) if i not in gpu_ids_in_use]

print(get_free_gpu_indices())

This gives entire list of free GPUs available.

2 Likes