Setting different weight-decay values for parameters within one FSDP unit

Hi,

You should be able to enable this behavior with use_original_parameters=True which is required to be set when multiple parameter groups are used.