Hi, I’m trying to set different lr for different layers in fsdp models.
For DDP, the model.named_parameters()
function gives me all the names and their corresponding weights of all layers. And I can simply filter out the norm layers and apply no weight decay for these layers.
While for fsdp, named_parameters()
gives only ._fsdp_wrapped_module.flat_param
and a flattened array. I’m wondering if it is possible to also apply no weight decay for norm layers inside fsdp.