In a multi-GPU DDP environment, if the loss on one rank is NaN while the others are normal, could this cause the all-reduce to hang?

No, as answered in your cross post.