Get Detected mismatch between collectives on ranks on fsdp

Hi,I ues fsdp on 2 A100 nodes got “Detected mismatch between collectives on ranks on fsdp” error. but
1 A100 node is ok. How can i deubg this error. Thank you!

env:

Name: torch
Version: 2.0.0+cu118
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /root/miniconda3/lib/python3.10/site-packages
Requires: filelock, jinja2, networkx, sympy, triton, typing-extensions
Required-by: flash-attn, torchaudio, torchmetrics, torchvision, triton

log:

File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 437, in __init__
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 426, in _init_param_handle_from_module
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 426, in _init_param_handle_from_module
    _init_param_handle_from_module(
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 426, in _init_param_handle_from_module
        _sync_module_params_and_buffers(_sync_module_params_and_buffers(

_init_param_handle_from_module(  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 912, in _sync_module_params_and_buffers
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 912, in _sync_module_params_and_buffers

  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 426, in _init_param_handle_from_module
    _sync_params_and_buffers(
    _sync_module_params_and_buffers(
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 912, in _sync_module_params_and_buffers
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
    _sync_module_params_and_buffers(
      File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 912, in _sync_module_params_and_buffers
dist._broadcast_coalesced(
RuntimeError: Detected mismatch between collectives on ranks. Rank 1 is running collective: CollectiveFingerPrint(OpType=BROADCAST, TensorShape=[131074048], TensorDtypes=Float, TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda, layout=Strided (default), requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt))), but Rank 0 is running collective: CollectiveFingerPrint(OpType=BROADCAST).
    _sync_params_and_buffers(
    _sync_params_and_buffers(  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers

  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
    _sync_params_and_buffers(
  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
    dist._broadcast_coalesced(
    RuntimeErrordist._broadcast_coalesced(:
Detected mismatch between collectives on ranks. Rank 4 is running collective: CollectiveFingerPrint(OpType=BROADCAST, TensorShape=[131074048], TensorDtypes=Float, TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda, layout=Strided (default), requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt))), but Rank 0 is running collective: CollectiveFingerPrint(OpType=BROADCAST).
RuntimeError: Detected mismatch between collectives on ranks. Rank 5 is running collective: CollectiveFingerPrint(OpType=BROADCAST, TensorShape=[131074048], TensorDtypes=Float, TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda, layout=Strided (default), requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt))), but Rank 0 is running collective: CollectiveFingerPrint(OpType=BROADCAST).
        _sync_params_and_buffers(dist._broadcast_coalesced(

RuntimeError  File "/root/miniconda3/lib/python3.10/site-packages/torch/distributed/utils.py", line 160, in _sync_params_and_buffers
: Detected mismatch between collectives on ranks. Rank 7 is running collective: CollectiveFingerPrint(OpType=BROADCAST, TensorShape=[131074048], TensorDtypes=Float, TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda, layout=Strided (default), requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt))), but Rank 0 is running collective: CollectiveFingerPrint(OpType=BROADCAST).
    dist._broadcast_coalesced(
RuntimeError: Detected mismatch between collectives on ranks. Rank 6 is running collective: CollectiveFingerPrint(OpType=BROADCAST, TensorShape=[131074048], TensorDtypes=Float, TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda, layout=Strided (default), requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt))), but Rank 0 is running collective: CollectiveFingerPrint(OpType=BROADCAST).