DataParallel not effective


(Qiao Jin) #1

Hi all,
I have a quite large model and need to do data parallel among multiple GPUs.
I used:

model = nn.DataParallel(model)

And there are three visible GPUs. The GPU usage is:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     12444      C   python                                     10741MiB |
|    1     12444      C   python                                      4663MiB |
|    2     12444      C   python                                      4633MiB |
+-----------------------------------------------------------------------------+

GPU1 and GPU2 are not fully utilized, but I cannot increase the batch size because then there will be memory error on GPU0.
Does anyone know how to solve this problem?


DataParallel imbalanced memory usage
#2

Hello, I am having the same issue. Have you solved it? Thanks


(Liujie Zhang) #3

Hi, Andy-jpa
If the the first gpu (id:0) is occupied all the time and no more space to use, you could consider to use only 1 and 2 gpu explictly assigned in you code by device_ids.

refer : https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

code like this:

def data_parallel(module, input, device_ids, output_device=None):
    if not device_ids:
        return module(input)

    if output_device is None:
        output_device = device_ids[0]

    replicas = nn.parallel.replicate(module, device_ids)
    inputs = nn.parallel.scatter(input, device_ids)
    replicas = replicas[:len(inputs)]
    outputs = nn.parallel.parallel_apply(replicas, inputs)
    return nn.parallel.gather(outputs, output_device)