Hi,
I am using PyTorch 2.5.1 with FSDP and 2 optimizers that each handle a separate set of parameters.
Training runs fine, but I am running into errors when trying to get the full_state_dict during checkpointing. I am calling
get_state_dict(model, [optimizer1, optimizer2], options=StateDictOptions(full_state_dict=True))
and getting RuntimeError: some.weight is not in the optimizer state
about optimizer1 about some weight that is handled by optimizer2.
If I change the call to pass [optimizer2, optimizer1]
I get errors about optimizer2 not knowing about a weight that is handled by optimizer1.
How do I get around this?
Thanks.