Hi all,
I’m curious whether Muon’s official implementation supports advanced data parallelism strategies other than DDP.
Thanks
Hi all,
I’m curious whether Muon’s official implementation supports advanced data parallelism strategies other than DDP.
Thanks
Collaborate on a proof-of-concept, FSDP-compatible Muon optimizer that is both logically correct and communication-efficient.