FSDP hangs when using MOE

Does anyone experienced similar problem?

Self-resolved in FSDP hangs when combing MoE architecture · Issue #126616 · pytorch/pytorch · GitHub