FullyShardedDataParallel question

Using FSDP with torch 1.11.0 on a model containing a ParameterList – FSDP(model) seems to remove the parameterlist content. I’ve even tried “manual wrapping”, wrapping only other layers (using the toy model below) but the content of the parameterlist still changes. I’m also getting a warning about ‘Setting attributes on ParameterList is not supported’, but I’m trying to avoid wrapping the parameterlist. Here’s the model being manually wrapped:

class model(nn.Module):
def init(self):
super().init()
self.layer1 = wrap(nn.Linear(8, 4))
self.layer2 = nn.Linear(4, 16)
self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])
# self.param = nn.Parameter(torch.randn(10,10))
self.layer3 = wrap(nn.Linear(16, 4))

Here’s printing the model before and after wrapping:
print(“orig model”, model())
print(“params”, model().params)
wrapper_kwargs = dict(cpu_offload=CPUOffload(offload_params=True))
with enable_wrap(wrapper_cls=FullyShardedDataParallel, **wrapper_kwargs):
fsdp_model = wrap(model())
print(fsdp_model)
print(“params”, fsdp_model.params)

and the output:
orig model model(
(layer1): Linear(in_features=8, out_features=4, bias=True)
(layer2): Linear(in_features=4, out_features=16, bias=True)
(params): ParameterList(
(0): Parameter containing: [torch.FloatTensor of size 10x10]
(1): Parameter containing: [torch.FloatTensor of size 10x10]
(2): Parameter containing: [torch.FloatTensor of size 10x10]
(3): Parameter containing: [torch.FloatTensor of size 10x10]
(4): Parameter containing: [torch.FloatTensor of size 10x10]
(5): Parameter containing: [torch.FloatTensor of size 10x10]
(6): Parameter containing: [torch.FloatTensor of size 10x10]
(7): Parameter containing: [torch.FloatTensor of size 10x10]
(8): Parameter containing: [torch.FloatTensor of size 10x10]
(9): Parameter containing: [torch.FloatTensor of size 10x10]
)
(layer3): Linear(in_features=16, out_features=4, bias=True)
)
params ParameterList(
(0): Parameter containing: [torch.FloatTensor of size 10x10]
(1): Parameter containing: [torch.FloatTensor of size 10x10]
(2): Parameter containing: [torch.FloatTensor of size 10x10]
(3): Parameter containing: [torch.FloatTensor of size 10x10]
(4): Parameter containing: [torch.FloatTensor of size 10x10]
(5): Parameter containing: [torch.FloatTensor of size 10x10]
(6): Parameter containing: [torch.FloatTensor of size 10x10]
(7): Parameter containing: [torch.FloatTensor of size 10x10]
(8): Parameter containing: [torch.FloatTensor of size 10x10]
(9): Parameter containing: [torch.FloatTensor of size 10x10]
)
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py:487: UserWarning: Setting attributes on ParameterList is not supported.
warnings.warn(“Setting attributes on ParameterList is not supported.”)
FullyShardedDataParallel(
(_fsdp_wrapped_module): FlattenParamsWrapper(
(_fpw_module): model(
(layer1): FullyShardedDataParallel(
(_fsdp_wrapped_module): FlattenParamsWrapper(
(_fpw_module): Linear(in_features=8, out_features=4, bias=True)
)
)
(layer2): Linear(in_features=4, out_features=16, bias=True)
(params): ParameterList()
(layer3): FullyShardedDataParallel(
(_fsdp_wrapped_module): FlattenParamsWrapper(
(_fpw_module): Linear(in_features=16, out_features=4, bias=True)
)
)
)
)
)
params [Parameter containing:
tensor([-0.2085, 0.1279, -0.1589, …, 1.9485, -0.1839, -0.6993],
requires_grad=True)]

Can I FSDP the model in some way that leaves the parameterlist entries alone?

Hey thanks for trying out FSDP API!

What you mentioned is expected, because the parameterlist is wrapped by outermost FSDP wrapper, and they are flattened and sharded, so losing the access to the original parameterlist.

Do you want to make FSDP wrapper ignore the params? we added a new argument in nightly called “ignored_module”, ignored_module will not be managed by FSDP, so these modules will not be flattened, sharded and their gradients will not be synced as well. Is this what you want?

Hi @Yanli_Zhao – this does indeed look like what I want. Thank you very much!