Can you please elaborate on “When run in a 1 gpu / process configuration Distributed Data Parallel can be beneficial as CPU based overheads are now spread across multiple processes”? Thanks!
Can you please elaborate on “When run in a 1 gpu / process configuration Distributed Data Parallel can be beneficial as CPU based overheads are now spread across multiple processes”? Thanks!