Should we set non_blocking to True?

Thanks @ptrblck – I imagine, I can also use the torch.comm package? I do not want to init a process group, just do collectives on NCCL.