Hi, I’m training a model using FSDP and wanted to use different learning rates for different parameters.
In order to get a list of the full names of the parameters, I used the summon_full_params()
context manager and then I filter my params as per their names into two buckets param_group_1
and param_group_2
.
Then I pass these groups into the optimizer as:
torch.optim.SGD(
[{"params": param_group_1, "lr": 1e-3}, {"params": param_group_2, "lr": 1e-4}]
)
However this fails because -
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
If I pass the full model parameters into 1 group then it works but I would like to fine tune the LRs
torch.optim.SGD(
fsdp_model.parameters(), lr=1e-3
)
Is there a correct way of doing this? Am I finding the parameter groups correctly?