FSDP: Different learning rate for different weights

Hey there,
is it possible to have different learning rates for different weights of a model wrapped in FSDP?

    model = FSDP( 
        model, auto_wrap_policy=fsdp_auto_wrap_policy, sharding_strategy=fsdp_sharding_strategy, 
        mixed_precision=fsdp_mixed_precision, cpu_offload=fsdp_cpu_offload, device_id=device,
        limit_all_gathers=fsdp_limit_all_gathers,
    )

    # SETUP OPTIMIZER, SCHEDULER & CRITERION
    normal_params = nn.ParameterList()
    clip_params = nn.ParameterList()
    for k, v in model.named_parameters():
        print(k)
        if k.startswith("_fsdp_wrapped_module.clip"):
            clip_params.append(v)
        else:
            normal_params.append(v)
    print(f"Normal Parameters: {sum([p.numel() for p in normal_params])}")
    print(f"Clip Parameters: {sum([p.numel() for p in clip_params])}")


    optimizer = optim.AdamW([{"params": normal_params}, {"params": clip_params, "lr": lr_clip}], lr=lr) 

This only seems to get a fraction of the parameters and as a result the clip_params are empty. Is there a way to make this work?

FSDP config:

fsdp_auto_wrap_policy = ModuleWrapPolicy([ResBlock, AttnBlock, TimestepBlock, FeedForwardBlock])
fsdp_sharding_strategy = ShardingStrategy.SHARD_GRAD_OP 
fsdp_fullstate_save_policy = FullStateDictConfig(offload_to_cpu=True, rank0_only=True)
fsdp_cpu_offload = None
fsdp_mixed_precision = MixedPrecision(
    param_dtype=torch.bfloat16,
    reduce_dtype=torch.bfloat16,
    buffer_dtype=torch.bfloat16,
)
fsdp_limit_all_gathers = True # False