Do gradients propagate through all_reduce & all_gather?

wanchaol · August 23, 2022, 4:55am

@derJaeger when you refer to “travel back”, do you mean the gradient flow back to each individual GPU? If so, the answer is not, it will not automatically flow the gradients back to each individual GPU samples if you use the c10d collective. Because currently c10d collective is not autograd enabled yet.

We are working on making the c10d collective autograd enabled (there’s a version of implementation that you can try to use and refer to in here, but it’s not publicly documented and it’s not been publicly released yet, not maintaining well either, so when using it please take your own risk (we might delete this in the future release and make the c10d collective directly autograd enabled). If you want to use it, I recommend you refer to this implementation and write your own version.