Hello, I followed the online DataParallel tutorial and I can’t get the model to split compute evenly among different GPUs at score-time (forward pass of trained model). On 3 GPUs, I get something like this:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… On | 00001961:00:00.0 Off | 0 |
| N/A 53C P0 224W / 300W | 15248MiB / 16130MiB | 93% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… On | 00003673:00:00.0 Off | 0 |
| N/A 49C P0 86W / 300W | 7004MiB / 16130MiB | 6% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla V100-SXM2… On | 00005A1F:00:00.0 Off | 0 |
| N/A 54C P0 76W / 300W | 6996MiB / 16130MiB | 85% Default |
±------------------------------±---------------------±---------------------+
So usually GPU 0 and 2 are loaded and 1 is underutilized. Also I get a very large lag in-between batches, almost 1-2 seconds of idle time when all three GPUs are at 0%, then they do some compute, then go to 0% again.
My guess is that syncing on GPU 0 is the culprit - is there a way I can run distributed operation on multiple GPUs for scoring in pyTorch to obtain even memory usage and compute across multiple GPUs? Notice how this is different from training as I’m not computing the loss and aggregating gradients.
The code is here: https://github.com/waldeland/CNN-for-ASI/blob/master/test_parallel.py and I already tried calling .to(device) before DataParallel and specifying “device_ids” - nothing seems to work. Another option would be to use DistributedDataParallel I suppose, but I want to understand why this isn’t working first.