Will modules passed to ModuleWrapPolicy each be in their own FSDP unit?

Struggling to parse this from the docs

For convenience, this accepts ModuleWrapPolicy directly, which allows users to specify the module classes to wrap (e.g. the transformer block).

And what happens to all the other modules? E.g. the input and output embedding? Are they put in an “other” FSDP unit?

Looks to do that the title says: pytorch/torch/distributed/fsdp/wrap.py at main · pytorch/pytorch · GitHub

1 Like