I’m using ZeroRedundancyOptimizer in torch1.8. And I notice that in the step function of ZeRO, there is a update_param_groups before the self.optim.step().
I wonder if this function broadcast the gradients that the self.optim uses to calculate the new parameters. I don’t see any docstring explains where the gradients go.
Looking forward to any replies, please~~