Checkpoint in DDP

Do we have to save checkpoint after every epoch in DDP(DistributedDataParallel), or we can just store best checkpoint in a variable and save it after all the iterations are over?

I don’t think you have to store checkpoints during the training, but it might be a good way to store some intermediate checkpoints in case your training experiences any issues and crashes.

Thank you, will do that.