Would skip the optimization step in autocast mode affect ZeroRedundancyOptimizer?

syorami · October 21, 2022, 3:07am

I’m now using ZeroRedundancyOptimizer with autocast mode and happen to find that the updated parameters would by synced across all ranks. However, in autocast mode, when inf/NaN gradients are found, the optimization step would be skipped. Would this arouse the hang up problem when a rank skip the optimization while other ranks still wait for syncronization?

    def step(
        self,
        closure: Optional[Callable[[], float]] = None,
        **kwargs: Any,
    ) -> Optional[float]:
        r"""
        Performs a single optimizer step and syncs parameters across all ranks.

        Arguments:
            closure (callable): a closure that re-evaluates the model and
                returns the loss; optional for most optimizers.
        Returns:
            Optional loss depending on the underlying local optimizer.

        .. note: Any extra parameters are passed to the base optimizer as-is.
        """
        if self._overlap_with_ddp:
            logging.warning(
                "`step()` should not be included in the training loop when "
                "`overlap_with_ddp=True`"
            )
            return None
        # Perform the local optimizer step
        loss = self._local_step(closure=closure, **kwargs)
        # Sync all of the updated parameter shards across the ranks
        self._sync_params()

        return loss