Using FSDP with torch 1.11.0 on a model containing a ParameterList – FSDP(model) seems to remove the parameterlist content. I’ve even tried “manual wrapping”, wrapping only other layers (using the toy model below) but the content of the parameterlist still changes. I’m also getting a warning about ‘Setting attributes on ParameterList is not supported’, but I’m trying to avoid wrapping the parameterlist. Here’s the model being manually wrapped:
class model(nn.Module):
def init(self):
super().init()
self.layer1 = wrap(nn.Linear(8, 4))
self.layer2 = nn.Linear(4, 16)
self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])
# self.param = nn.Parameter(torch.randn(10,10))
self.layer3 = wrap(nn.Linear(16, 4))
Here’s printing the model before and after wrapping:
print(“orig model”, model())
print(“params”, model().params)
wrapper_kwargs = dict(cpu_offload=CPUOffload(offload_params=True))
with enable_wrap(wrapper_cls=FullyShardedDataParallel, **wrapper_kwargs):
fsdp_model = wrap(model())
print(fsdp_model)
print(“params”, fsdp_model.params)
and the output:
orig model model(
(layer1): Linear(in_features=8, out_features=4, bias=True)
(layer2): Linear(in_features=4, out_features=16, bias=True)
(params): ParameterList(
(0): Parameter containing: [torch.FloatTensor of size 10x10]
(1): Parameter containing: [torch.FloatTensor of size 10x10]
(2): Parameter containing: [torch.FloatTensor of size 10x10]
(3): Parameter containing: [torch.FloatTensor of size 10x10]
(4): Parameter containing: [torch.FloatTensor of size 10x10]
(5): Parameter containing: [torch.FloatTensor of size 10x10]
(6): Parameter containing: [torch.FloatTensor of size 10x10]
(7): Parameter containing: [torch.FloatTensor of size 10x10]
(8): Parameter containing: [torch.FloatTensor of size 10x10]
(9): Parameter containing: [torch.FloatTensor of size 10x10]
)
(layer3): Linear(in_features=16, out_features=4, bias=True)
)
params ParameterList(
(0): Parameter containing: [torch.FloatTensor of size 10x10]
(1): Parameter containing: [torch.FloatTensor of size 10x10]
(2): Parameter containing: [torch.FloatTensor of size 10x10]
(3): Parameter containing: [torch.FloatTensor of size 10x10]
(4): Parameter containing: [torch.FloatTensor of size 10x10]
(5): Parameter containing: [torch.FloatTensor of size 10x10]
(6): Parameter containing: [torch.FloatTensor of size 10x10]
(7): Parameter containing: [torch.FloatTensor of size 10x10]
(8): Parameter containing: [torch.FloatTensor of size 10x10]
(9): Parameter containing: [torch.FloatTensor of size 10x10]
)
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py:487: UserWarning: Setting attributes on ParameterList is not supported.
warnings.warn(“Setting attributes on ParameterList is not supported.”)
FullyShardedDataParallel(
(_fsdp_wrapped_module): FlattenParamsWrapper(
(_fpw_module): model(
(layer1): FullyShardedDataParallel(
(_fsdp_wrapped_module): FlattenParamsWrapper(
(_fpw_module): Linear(in_features=8, out_features=4, bias=True)
)
)
(layer2): Linear(in_features=4, out_features=16, bias=True)
(params): ParameterList()
(layer3): FullyShardedDataParallel(
(_fsdp_wrapped_module): FlattenParamsWrapper(
(_fpw_module): Linear(in_features=16, out_features=4, bias=True)
)
)
)
)
)
params [Parameter containing:
tensor([-0.2085, 0.1279, -0.1589, …, 1.9485, -0.1839, -0.6993],
requires_grad=True)]
Can I FSDP the model in some way that leaves the parameterlist entries alone?