Thanks for the tips, guys. Your suggestions, and looking at the .step()
method of an optimizer, provide some key insight. Here’s a compact example for reference purposes, in case anyone else has this need:
model = nn.Sequential(
nn.Conv2d(...),
nn.PReLU(...),
nn.Conv2d(...),
nn.PReLU(...),
nn.Conv2d(...),
)
params = [
{ 'params' : model[0].parameters() },
{ 'params' : model[2].parameters() },
{ 'params' : model[4].parameters() },
{ 'params' : model[1].parameters(), 'weight_decay':0 },
{ 'params' : model[3].parameters(), 'weight_decay':0 },
]
Then…
opt = optim.SGD(params, lr=0.002, weight_decay=1e-5)
Using two separate optimizers might be problematic due to this line: p.data.add_(-group['lr'], d_p)