I think the easier approach would be to calculate the weight decay for the “used” parameters manually instead of trying to disable it for “unused” parameters.
Here is a simple example of adding a custom regularization.
I think the easier approach would be to calculate the weight decay for the “used” parameters manually instead of trying to disable it for “unused” parameters.
Here is a simple example of adding a custom regularization.