nn.DataParallel - Multi-GPU scaling

I just want to ask, whether there are any benchmarks on what efficiency is to be expected with nn.DataParallel and standard task such as Imagenet/Cifar-10 for common architectures?

For example how much faster (meaning iteration/s) will torchvision.models.resnet18 be on two, three or fours GPUs, how much faster will resnet152 be?

What’s your personal experience with multi-gpu scaling?
I see ~85% for resnet-50 for example with the maximum batch-size.