All reduce Algorithm

I am conducting a survey on distributed training. What type of all reduce algorithm does pytorch use for distributed training ?

Facebook mentioned it using a scatter gather + all gather approach (the halfing doubling algorithm). I’m curious to see what pytorch supports and hopefully someone who can shed light on why a certain method is more popular