FSDP2 evaluate during training

XYG_123 · January 11, 2025, 12:56am

Hi team, I am using the FSDP2 to shard my model.
What is the best practice to evaluate (like calling eval_forward for the module) during training here?

I know we can register_forward hooks for eval_forward but it might call some unused hooks.
Can I make it back to non-FSDP module and shard again after I finish the evaluation?

XYG_123 · January 12, 2025, 3:32am

I have to do some hacky way:

def hack_unwrap_the_fsdp(model):
    fs_storage_writer = FileSystemWriter(SAVED_PATH)
    save(
        state_dict=model.state_dict(),
        storage_writer=fs_storage_writer,
    )
    unwrapped_model = ToyModel().cuda()
    fs_reader = FileSystemReader(SAVED_PATH)
    state_dict = unwrapped_model.state_dict()
    load(state_dict, storage_reader=fs_reader)
    unwrapped_model.load_state_dict(state_dict) 
    return unwrapped_model

And unwrapped_model is not sharded. Not sure if there is better way.