Optimizer Parameters

Bruno_Oliveira · April 11, 2019, 12:48am

I’m pretty new to PyTorch and I’m working on a network which uses some PReLU activation layers and an Adam optimization. Since I want to try Adam with weight decay, I’ve been reading that I should not use it with PReLU because its “a” parameter would converge to some small value after some epochs and I would essentially end up with a ReLU or a LeakyReLU.

To adress this issue, I was thinking about excluding the PReLU layers from Adam’s params. The problem is that I don’t know how to do it and I think, in this case, I would need another Adam optimizer set with weight decay = 0 dedicated only to my PReLU layers, as it would not be optimized through epochs and its “a” parameter would not get updated.

Is this the right way to deal with this issue? If so, how could I exclude particular layers from Adam and add them to another Adam optimizer set with weight decay = 0? Thanks in advance.

MariosOreo · April 11, 2019, 2:53am

Hello,

The model parameters are registered in optimizer.param_groups[0]['params']. So I think you could create two separate dict for PReLU and others. It may not be the best solution, but it works simply.

prelu_ = []
others_ = []

for module in model.children():
    if isinstance(module, nn.PReLU):
        prelu_.append(module.a)
    else:
        others_.append(module.weight) # omit bias

# you could use one optimizer like this
optimizer = torch.optim.Adam([{'params': prelu_, 'lr': prelu_lr, 'weight_decay': 0},
                              {'params': others_, 'lr': lr, 'weight_decay':wd}])

see more details here.

Bruno_Oliveira · April 11, 2019, 5:21pm

Hey, Marios

Thanks for the input. I am trying your code at the moment and it seems to work like a charm. I had to adapt it to my model because I created some classes for it and didn’t define a weight method, but it’s not a big deal. For now the issue is solved, if something goes wrong, I’ll come back and share with you.

Thank you!