Waining:There is an imbalance between your GPUs

xu_wang · April 28, 2018, 3:09am

When I was using Dataparallel with multi GPUs,a warning below appear:

There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.

what doesit mean?

ptrblck · April 28, 2018, 7:19pm

It means you machine has different GPUs which differ a lot regarding their performance.
If you would like to use DataParallel the weak GPUs will most likely be a bottleneck in your code. That’s why you can just ignore the weak GPUs using device_ids or CUDA_VISIBLE_DEVICES.

The code of the check and warning is defined here.

Mata_Fu · June 1, 2018, 1:22pm

Hi, I have the same problem here. But I’m pretty sure that all of my GPU are the same.

Fri Jun  1 15:17:23 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24                 Driver Version: 396.24                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:01:00.0 Off |                  N/A |
| 25%   55C    P8    17W / 250W |    887MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:02:00.0 Off |                  N/A |
| 33%   65C    P8    20W / 250W |    475MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   41C    P8    13W / 250W |     11MiB / 12211MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Mata_Fu · June 1, 2018, 1:46pm

By the way, have you solved this problem?

liankf · June 14, 2018, 2:35pm

I have the same warning, and I solve it by ''from future import division ‘’. because in python 2.*, 1/2 = 0.
so in the function" warn_imbalance"
def warn_imbalance(get_prop):
values = [get_prop(props) for props in dev_props]
min_pos, min_val = min(enumerate(values), key=operator.itemgetter(1))
max_pos, max_val = max(enumerate(values), key=operator.itemgetter(1))
if min_val / max_val < 0.75:
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
return True
return False

min_val / max_val is directly zero

seyeeet · March 20, 2020, 8:29pm

@ptrblck

can you please help me with this issue.
I have 3 GPUs, 1 is 24GB and 2 are 11GB.
when I have batch size 4, it gives me gpu cuda memory error. However, I am sure the total gpu memory of 3 gpus is capable of handleing it.
How I can make sure to send 2 batches to gpu 0 and 1 batch to gpu 1 and one batch to gpu 2?

ptrblck · March 20, 2020, 8:41pm

You could manipulate or write a custom scatter and gather strategy based on this code.
I’m not aware of a simple method to specify the chunk sizes.

seyeeet · March 20, 2020, 9:06pm

I’m a beginner about this whole thing. are you aware of a simple code that do that so I can learn from it?