FullyShardedDataParallel
implements the method clip_grad_norm_, but what would the equivalent for FSDP2 (fully_shard)? If there is no method, how could it be implemented?
Thank you
FullyShardedDataParallel
implements the method clip_grad_norm_, but what would the equivalent for FSDP2 (fully_shard)? If there is no method, how could it be implemented?
Thank you