How to add L2 Regularization to Parameter groups

I have parameter groups down below with different learning rates using the Adam optimizer and I would like to add weight decay. How is this done in pytorch.

Also does adding L2 Regularization become redundant when using a weighted loss function? or could they both be used together

def define_optimizer():
  optimizer = optim.Adam([
      {'params' : model.classifier.parameters(), 'lr':0.001},
      {'params' : model.features[7].parameters(), 'lr' : 0.00001},
      {'params' : model.features[8].parameters(), 'lr' : 0.00001}
      ])
  return optimizer

Hi Abas!

Simply use Adam’s weight_decay parameter:

      {'params' : model.classifier.parameters(), 'lr':0.001, weight_decay:0.00001},

Note that for optimizers other than plain-vanilla SGD, including Adam, weight decay and
L2 regularization are not technically exactly the same, although they both do pretty much
the same thing. See for example this discussion “Weight decay or L2 regularization?”

I assume that by “weighted loss” you mean something like class weights such as
CrossEntropyLoss’s weight constructor argument. If so, this is separate from L2
regularization (or weight decay) and the two can sensibly be used together.

Best.

K. Frank

Really appreciate the response Frank. I have a quick question in regards to the Cyclic learning rate scheduler. I do understand according to Leslie n.Smith that before using a scheduler we must determine the min_lr and max_lr. To do this a LR range test is used to find the points where loss begins to increase(as our min) and where loss beings to diverge(our Max).

is there any pytorch implementation for implementing this LR range test?

from what I’ve seen Fast.ai has a implementation but im not sure if its okay to use
from torch_lr_finder import LRFinder

from torch_lr_finder import LRFinder
model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, val_loader=val_loader, end_lr=1, num_iter=100, step_mode="linear")
lr_finder.plot(log_lr=False)
lr_finder.reset()

And if I were to use a subset of my data(20% of 35k images) to tune Hyperparameters is the general consensus that the results from these subsets are more likely to reflect the full training data