FSDP Hybrid Shard Mode - is it feature complete?

orchidmajumder · May 4, 2023, 7:43pm

Hi,
I wanted to know if the HYBRID_SHARD mode of FSDP has feature parity at this point with respect to FSDP with FULL_SHARD or SHARD_GRAD_OP mode. For a given model (CLIP-like), right now, for me, saving optimizer state_dict in the following fashion does not work in HYBRID mode, but works in FULL_SHARD. It fails with an index-out-of-bound error.

FSDP.set_state_dict_type(
module=self.model,
state_dict_type=StateDictType.FULL_STATE_DICT,
optim_state_dict_config=FullOptimStateDictConfig(rank0_only=True, offload_to_cpu=True),
state_dict_config=FullStateDictConfig(rank0_only=True, offload_to_cpu=True),
)
state_dict = FSDP.optim_state_dict(self.model, optimizer)

I wanted to know if this is expected or if I need to something different for the HYBRID as opposed to FULL shard. Thanks in advance and really appreciate all the great work you have done on FSDP and in PyTorch in general.

agu · May 5, 2023, 12:59am

Could you file an issue on Github with a way to reproduce the issue?

orchidmajumder · May 5, 2023, 1:32am

Thanks, I’ll do it. The error only happens in a multi-node setup, it works fine on single node.