Waining:There is an imbalance between your GPUs

When I was using Dataparallel with multi GPUs,a warning below appear:

There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.

what doesit mean?

It means you machine has different GPUs which differ a lot regarding their performance.
If you would like to use DataParallel the weak GPUs will most likely be a bottleneck in your code. That’s why you can just ignore the weak GPUs using device_ids or CUDA_VISIBLE_DEVICES.

The code of the check and warning is defined here.

Hi, I have the same problem here. But I’m pretty sure that all of my GPU are the same.

Fri Jun  1 15:17:23 2018       
| NVIDIA-SMI 396.24                 Driver Version: 396.24                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX TIT...  Off  | 00000000:01:00.0 Off |                  N/A |
| 25%   55C    P8    17W / 250W |    887MiB / 12212MiB |      0%      Default |
|   1  GeForce GTX TIT...  Off  | 00000000:02:00.0 Off |                  N/A |
| 33%   65C    P8    20W / 250W |    475MiB / 12212MiB |      0%      Default |
|   2  GeForce GTX TIT...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   41C    P8    13W / 250W |     11MiB / 12211MiB |      0%      Default |

By the way, have you solved this problem?

I have the same warning, and I solve it by ''from future import division ‘’. because in python 2.*, 1/2 = 0.
so in the function" warn_imbalance"
def warn_imbalance(get_prop):
values = [get_prop(props) for props in dev_props]
min_pos, min_val = min(enumerate(values), key=operator.itemgetter(1))
max_pos, max_val = max(enumerate(values), key=operator.itemgetter(1))
if min_val / max_val < 0.75:
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
return True
return False

min_val / max_val is directly zero



can you please help me with this issue.
I have 3 GPUs, 1 is 24GB and 2 are 11GB.
when I have batch size 4, it gives me gpu cuda memory error. However, I am sure the total gpu memory of 3 gpus is capable of handleing it.
How I can make sure to send 2 batches to gpu 0 and 1 batch to gpu 1 and one batch to gpu 2?

You could manipulate or write a custom scatter and gather strategy based on this code.
I’m not aware of a simple method to specify the chunk sizes.

1 Like

I’m a beginner about this whole thing. are you aware of a simple code that do that so I can learn from it?