Hi guys, I’m running a WRN implementation with DataParallel with 4 GPUs. I’m seeing only about 50% reduction in epoch time. Is this normal? GPU utilizations are between 88 to 98 percent, which doesn’t seem too bad.
What kind of speedup do you expect from a 4GPU setup with DataParallel, on a reasonably large model like WRN?
Does it have anything to do with SLI setups? I’m running on Google Cloud Platform and I’ve no idea how they setup their GPUs.
Speed up depends on a lot of factors, including number of model parameters, data shape, and GPU bus interconnect latency. For example, I get -10% (negative) speedup for MNIST models, and 100% speedup for more complex models.
50% sounds like you are leaving some performance on the table. Try bigger models or bigger batch sizes and see if you can get better speed up.
Also be aware that you may be rate limited by the reduction phase if your model has a very large number of parameters. See my earlier question on the matter.