It there anyway to let program select free GPU automatically?

Mactarvish · May 5, 2018, 6:36am

Hi guys, I’ve got a two-GPUs PC and try to run two networks on GPUs parallelly. For this, now when I run one of them, I set torch.cuda.set_device(0) and torch.cuda.set_device(1) for another one. It there any functions or orders to judge which GPU is free and select it? Thank you very much~

ptrblck · May 5, 2018, 1:30pm

I’m not aware of a straightforward way, so please correct me, if there is a simpler solution.
For now you could call nvidia-smi inside your script and based on this output select the GPU you would like to use. The returned memory stats will be wrapped into a pandas.DataFrame so that you can sort it etc.
Here is a sample code:

import subprocess
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd

def get_free_gpu():
    gpu_stats = subprocess.check_output(["nvidia-smi", "--format=csv", "--query-gpu=memory.used,memory.free"])
    gpu_df = pd.read_csv(StringIO(gpu_stats),
                         names=['memory.used', 'memory.free'],
                         skiprows=1)
    print('GPU usage:\n{}'.format(gpu_df))
    gpu_df['memory.free'] = gpu_df['memory.free'].map(lambda x: x.rstrip(' [MiB]'))
    idx = gpu_df['memory.free'].idxmax()
    print('Returning GPU{} with {} free MiB'.format(idx, gpu_df.iloc[idx]['memory.free']))
    return idx

free_gpu_id = get_free_gpu()
torch.cuda.set_device(free_gpu_id)

Mactarvish · May 5, 2018, 1:56pm

Thanks for your reply, but I got such an error when I ran your code:

Traceback (most recent call last):
  File "/home/hdl2/Desktop/SonoFetalImage/playground.py", line 21, in <module>
    free_gpu_id = get_free_gpu()
  File "/home/hdl2/Desktop/SonoFetalImage/playground.py", line 12, in get_free_gpu
    gpu_df = pd.read_csv(StringIO(gpu_status),
TypeError: initial_value must be str or None, not bytes

How to solve it?

ptrblck · May 5, 2018, 2:38pm

This error is thrown by StringIO for example if you included from io import StringIO while using Python2.
That’s strange, since I built in a version check.

Which Python version are you using?

Mactarvish · May 6, 2018, 7:24am

Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

ptrblck · May 6, 2018, 1:05pm

I’m not sure, why this error occurs, but it seems to be related to handling unicode strings.

I tried the code with Python3.6 and it worked by adding the following bit:

gpu_df = pd.read_csv(StringIO(u"".join(gpu_stats)),
                         names=['memory.used', 'memory.free'],
                         skiprows=1)

Mactarvish · May 7, 2018, 1:17pm

Regardless of these, I seem to find a simpler way, just like that:

def get_freer_gpu():
    os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp')
    memory_available = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
    return np.argmax(memory_available)

ptrblck · May 7, 2018, 1:19pm

Nice! Looks indeed a bit shorter

yusanshi · January 20, 2021, 5:02am

I think you can try this:

pen_good · August 26, 2021, 9:02am

import os
def find_gpus(nums=6):
    os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp_free_gpus')
    with open('tmp_free_gpus', 'r') as lines_txt:
        frees = lines_txt.readlines()
        idx_freeMemory_pair = [ (idx,int(x.split()[2]))
                              for idx,x in enumerate(frees) ]
    idx_freeMemory_pair.sort(key=lambda my_tuple:my_tuple[1],reverse=True)
    usingGPUs = [str(idx_memory_pair[0])
                    for idx_memory_pair in idx_freeMemory_pair[:nums] ]
    usingGPUs =  ','.join(usingGPUs)
    print('using GPU idx: #', usingGPUs)
    return usingGPUs
os.environ['CUDA_VISIBLE_DEVICES'] = find_gpus(nums=4)  # 必须在import torch前面

afspies · February 3, 2022, 6:33pm

Hi,

Here’s a version which doesn’t write to an intermediate file, and lets you set the vram threshold:

Automatic GPU Allocation (github.com)

This is based on @pen_good and @ptrblck 's versions - thanks!

Alex

pen_good · May 23, 2022, 8:14am

Today, I find this command does not give output anymore

‘nvidia-smi -q -d Memory |grep -A4 GPU|grep Free’

Seonjun_Kim · May 26, 2022, 8:37am

import subprocess

def run_cmd(cmd):
    out = (subprocess.check_output(cmd, shell=True)).decode('utf-8')[:-1]
    return out

def get_free_gpu_indices():
    out = run_cmd('nvidia-smi -q -d Memory | grep -A4 GPU')
    out = (out.split('\n'))[1:]
    out = [l for l in out if '--' not in l]

    total_gpu_num = int(len(out)/5)
    gpu_bus_ids = []
    for i in range(total_gpu_num):
        gpu_bus_ids.append([l.strip().split()[1] for l in out[i*5:i*5+1]][0])

    out = run_cmd('nvidia-smi --query-compute-apps=gpu_bus_id --format=csv')
    gpu_bus_ids_in_use = (out.split('\n'))[1:]
    gpu_ids_in_use = []

    for bus_id in gpu_bus_ids_in_use:
        gpu_ids_in_use.append(gpu_bus_ids.index(bus_id))

    return [i for i in range(total_gpu_num) if i not in gpu_ids_in_use]

print(get_free_gpu_indices())

This gives entire list of free GPUs available.

SamerMakni · October 21, 2024, 12:07pm

Hello, I was browsing this trying to find the best way to deal with this, since I’m working with a shared workstation that has multiple devices. So I made this package that auto selects the optimal cuda device, based on available memory, utilization, or power usage.

To install:

pip install cuda-selector

To use:

from cuda_selector import auto_cuda

# Select cuda device with most memory available
device = auto_cuda()

# Select cuda device with lowest power usasge usage
device = auto_cuda('power')

# Select cuda device with lowest power usasge usage
device = auto_cuda('utilization')```