I have recently been trying my hand at learning with Data Parallel using Distributed Data Parallel (DDP).
I understand that learning with DDP is done by creating replicas of the model on multiple devices (e.g. GPUs), splitting the data and training them, and synchronizing the weights. My question is, when are the weights synchronized between replicas? At each epoch?
I would appreciate it if you could enlighten me.