I have parameter groups down below with different learning rates using the Adam optimizer and I would like to add weight decay. How is this done in pytorch.
Also does adding L2 Regularization become redundant when using a weighted loss function? or could they both be used together
Note that for optimizers other than plain-vanilla SGD, including Adam, weight decay and
L2 regularization are not technically exactly the same, although they both do pretty much
the same thing. See for example this discussion “Weight decay or L2 regularization?”
I assume that by “weighted loss” you mean something like class weights such as CrossEntropyLoss’s weight constructor argument. If so, this is separate from L2
regularization (or weight decay) and the two can sensibly be used together.
Really appreciate the response Frank. I have a quick question in regards to the Cyclic learning rate scheduler. I do understand according to Leslie n.Smith that before using a scheduler we must determine the min_lr and max_lr. To do this a LR range test is used to find the points where loss begins to increase(as our min) and where loss beings to diverge(our Max).
is there any pytorch implementation for implementing this LR range test?
from what I’ve seen Fast.ai has a implementation but im not sure if its okay to use
from torch_lr_finder import LRFinder
And if I were to use a subset of my data(20% of 35k images) to tune Hyperparameters is the general consensus that the results from these subsets are more likely to reflect the full training data