PyTorch Forums
Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels
distributed
baiyuting
(Baiyuting)
May 19, 2024, 11:48pm
2
did you solve it? I met the same question, and I am confused why NumelIn=1, NumelOut=1
show post in topic