DataParallel performance?

Hi guys, I’m running a WRN implementation with DataParallel with 4 GPUs. I’m seeing only about 50% reduction in epoch time. Is this normal? GPU utilizations are between 88 to 98 percent, which doesn’t seem too bad.

What kind of speedup do you expect from a 4GPU setup with DataParallel, on a reasonably large model like WRN?

Does it have anything to do with SLI setups? I’m running on Google Cloud Platform and I’ve no idea how they setup their GPUs.

1 Like

Speed up depends on a lot of factors, including number of model parameters, data shape, and GPU bus interconnect latency. For example, I get -10% (negative) speedup for MNIST models, and 100% speedup for more complex models.

50% sounds like you are leaving some performance on the table. Try bigger models or bigger batch sizes and see if you can get better speed up.

1 Like

Also be aware that you may be rate limited by the reduction phase if your model has a very large number of parameters. See my earlier question on the matter.

1 Like