Distribution Implementations on multiple node

how can i only reduce some ranks? for example:
node1: gpu1 (rank0 ) … gpu4 ( rank3)
node2: gpu1 (rank5) … gpu8 (rank11)

node1 and node2 are in the same process group.

sometime, i only want reduce about the ranks that are on the same node (node1: gpu1 … gpu4)? the torch.distributed only can reduce all ranks at the same time