Optimizer_state_dict with multiple optimizers in FSDP

andreasst · February 27, 2025, 10:25pm

Hi,

I am using PyTorch 2.5.1 with FSDP and 2 optimizers that each handle a separate set of parameters.

Training runs fine, but I am running into errors when trying to get the full_state_dict during checkpointing. I am calling

get_state_dict(model, [optimizer1, optimizer2], options=StateDictOptions(full_state_dict=True))

and getting RuntimeError: some.weight is not in the optimizer state about optimizer1 about some weight that is handled by optimizer2.

If I change the call to pass [optimizer2, optimizer1] I get errors about optimizer2 not knowing about a weight that is handled by optimizer1.

How do I get around this?

Thanks.