Hi all,
I’m curious whether Muon’s official implementation supports advanced data parallelism strategies other than DDP.
Thanks
Hi all,
I’m curious whether Muon’s official implementation supports advanced data parallelism strategies other than DDP.
Thanks