Hi all,
I have a quite large model and need to do data parallel among multiple GPUs.
I used:
model = nn.DataParallel(model)
And there are three visible GPUs. The GPU usage is:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12444 C python 10741MiB |
| 1 12444 C python 4663MiB |
| 2 12444 C python 4633MiB |
+-----------------------------------------------------------------------------+
GPU1 and GPU2 are not fully utilized, but I cannot increase the batch size because then there will be memory error on GPU0.
Does anyone know how to solve this problem?
Hi, Andy-jpa
If the the first gpu (id:0) is occupied all the time and no more space to use, you could consider to use only 1 and 2 gpu explictly assigned in you code by device_ids.
I think this problem is batch_size need increase to use efficiently multiple GPUs but outputs increase either.
replicas … batch_size / GPUs * input_size
output_device … batch_size * GPUs * output_size
but output_device must have replica in the current DataParallel imprementation.
Therefore I try rewrite DataParallel to output_device avoid from replicas.
In my case, I could increase batch size to output_device GPU use 30GB and each replica GPUs uses 20GB.
But, I’m not sure it’s correct.
I know, it need resolve hide latency to get more efficiency process.
best solution is change to concurrent processing.
I have used PyTorch since last week replace from TensorFlow2.
I want to know best practice of PyTorch.