Amp delay unscale with apex ddp

krishansubudhi · February 25, 2020, 3:35pm

I am applying amp delay unscale to accumulate gradient. I am also using apex DDP for doing allreduce across processes. I am disabling apexDDP during gradient accumulation using disable_allreduce(self) function.

Just before forward pass of the iteration where I want to reduce my gradient, I call enable_allreduce(self) on apex DDP, set delay_unscale to false, call backward() on scaled loss, clip the gradients and then step the optimizer.

Is this a correct approach?