According to the principle of fsdp, more cards should not be oom。160 will oom in this func _init_param_handle_from_module。I open this option ``sync_module_states``, not knowing if it would make a difference?
According to the principle of fsdp, more cards should not be oom。160 will oom in this func _init_param_handle_from_module。I open this option ``sync_module_states``, not knowing if it would make a difference?