What is the difference between DataParallel and DistributedDataParallel?

Can you please elaborate on “When run in a 1 gpu / process configuration Distributed Data Parallel can be beneficial as CPU based overheads are now spread across multiple processes”? Thanks!

3 Likes