Can DistributedDataParallel still work efficiently on a single GPU?

From what I tested, DDP still runs for a single GPU, and essentially the DistributedSampler makes the dataset split by a single GPU - meaning that we get the same dataset unchanged. With this being said, are there any benefits to using DDP on a single GPU over the normal method?

No, I don’t think you should expect to see any benefits in launching DDP on a single GPU only, but would also assume to see no difference (between source code changes) to a standalone single-GPU script.

1 Like