Multiple jobs in the same multi-GPU instance

chilango · January 3, 2018, 8:46pm

Extending this short thread: I am performing grid-search of a small LSTM in an 8 GPU instance, and % usage is low. Which one is better to make a more efficient use of resources (given constant model complexity):

Send multiple jobs to all GPUs, or
Send multiple jobs to specific (non-overlapping) GPUs?

What are any pro/cons for each of these options (if at all possible)?

colesbury · January 3, 2018, 9:01pm

In my experience GPUs don’t multitask very efficiently. It’s usually better to run jobs on non-overlapping GPUs. (e.g. use CUDA_VISIBLE_DEVICES). Often, CUDA kernels are written to take advantage of the entire GPU.

You may still get some speed-up by, say running 2 jobs per GPU, but probably not 2x. Of course, this is dependent on your problem.

If you try multiple jobs per GPU, please report back what you see. I’m curious to know.

chilango · January 5, 2018, 8:38pm

For others who might be interested: I tested running multiple jobs on all 8 GPUs in a single instance (P2 instance on AWS, K80 GPU). When running a single job nvidia-smi shows about 2% usage per GPU. Running a second model spikes the usage of all GPUs to 100% and the jobs choke --literally stop running. So nope.

Running with CUDA_VISIBLE_DEVICES works as expected, but I have a couple of questions:

I notice that torch.cuda.current_device() is always assigned to 0, regardless of which GPUs are made visible in the environment. Not sure if this is a problem but I would have assumed that if only GPUs 4,5,6,7 are visible then current_device should be one of these?
With the exact same model settings as when using 8 GPUs, with 2 GPUs only the job is using around 5-40% of each GPU. Expected. But an epoch takes around 7 minutes with 8 GPUs but only 2 minutes with 2. What gives?

smth · January 5, 2018, 9:25pm

if CUDA_VISIBLE_DEVICES=4,5,6,7 then gpu-4 becomes 0, gpu-5 becomes 1, etc.