How to use FSDP and ema together?

Hi, currently in pytorch2.0, FSDP model do not support deepcopy, how can I copy a model param as ema and update it?

example, my fsdp model:

        sharding_strategy=torch.distributed.fsdp.ShardingStrategy.SHARD_GRAD_OP
        model = FSDP(model, sharding_strategy=sharding_strategy,
                     ignored_parameters = not_trainable, )

I can’t call deepcopy(model) directly, how should I achive ema?

In general (even without FSDP) it is advised to only store/copy the model’s state_dict(), not the model itself. Have you tried that?

Thanks. I will try only store/copy the model’s state_dict().

I tried to use summon_full_params to get full param and update ema.
However, this seem to cause too much memory fragments (reserved mem >> allocated mem).

So I abandoned FSDP and use deepspeed instead.