DDP Learning-Rate

More discussions can be found at Should we split batch_size according to ngpu_per_node when DistributedDataparallel

1 Like