Is torch Muon optimizer compatible with FSDP/HSDP?

Hi all,
I’m curious whether Muon’s official implementation supports advanced data parallelism strategies other than DDP.

Thanks

Collaborate on a proof-of-concept, FSDP-compatible Muon optimizer that is both logically correct and communication-efficient.