Struggling to parse this from the docs
For convenience, this accepts
ModuleWrapPolicy
directly, which allows users to specify the module classes to wrap (e.g. the transformer block).
And what happens to all the other modules? E.g. the input and output embedding? Are they put in an “other” FSDP unit?