Benefit of nn.DataParallel without batched input

Consider applications, e. g. Neural Style Transfer, where the input is not batched but rather a single image. Can an application like this benefit from multiple GPUs if the model is wrapped in a nn.DataParallel?

I’m asking, because I want to be sure before I buy new hardware. If I understand the multiprocessing correctly, for such applications multiple GPUs are useless and thus a single GPU with high memory would be the better choice.

Indeed, nn.DataParallel works by splitting batches on the available GPUs. multiprocessing however is used generally for loading data with multiple workers (CPU based) onto the GPU, therefore speeding training considerably.

I’m not familiar with style transfer training, but I don’t see a reason why training would happen with single image batches :thinking:. Even for inference, you might want to process multiple images at once…

In style transfer you do not train a model, but use the output of the intermediate layers of a pretrained model to optimize / train a single image. Thus, during the optimization nothing is loaded from the drive.

Interesting, thanks for the clarification! In that case, I think using DataParallel will add useless overhead :slight_smile:

1 Like

Thanks for the input, but since you ‘only’ think that multiple GPUs would be useless, I will keep this thread open for a little longer hoping someone who knows weighs in :wink:

As @alex.veuthey said, if you are only using a single image, the data parallel approach won’t speed up anything.

1 Like

I see no reason why you couldn’t add a batch dimension and process multiple images at a time and thus be able to leverage DataParallel. Just make sure you don’t reduce any of the losses across the batch dim.

@ptrblck Thanks for the clarification.

@rwightman As explained above, during the whole training (maybe optimisation is a better fit here) you only process a single image. You do not train a model, but rather train (optimise) the pixel values of a single image to fit a given loss function.