Don't Regularize Me, Bro

Thanks for the tips, guys. Your suggestions, and looking at the .step() method of an optimizer, provide some key insight. Here’s a compact example for reference purposes, in case anyone else has this need:

model = nn.Sequential(
  nn.Conv2d(...),
  nn.PReLU(...),
  nn.Conv2d(...),
  nn.PReLU(...),
  nn.Conv2d(...),
)

params = [
  { 'params' : model[0].parameters() },
  { 'params' : model[2].parameters() },
  { 'params' : model[4].parameters() },

  { 'params' : model[1].parameters(), 'weight_decay':0 },
  { 'params' : model[3].parameters(), 'weight_decay':0 },
]

Then…

opt = optim.SGD(params, lr=0.002, weight_decay=1e-5)

Using two separate optimizers might be problematic due to this line: p.data.add_(-group['lr'], d_p)

6 Likes