Custom optimizer

sandeep1 · October 15, 2021, 12:45pm

I am not able to understand the basics why the model.parameter is not given as parameter to the optimizer. And how a new parameter optimizer_grouped_parameters is created ?

text_model=AADP()
text_model.to(device)
criterion = nn.BCELoss()

no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
    {'params': [p for n, p in text_model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
    {'params': [p for n, p in text_model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.01}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=3e-5)

ptrblck · October 16, 2021, 12:03am

optimizer_grouped_parameters is creating two parameter groups by checking the no_decay names in each parameter. It seems that no weight_decay should be used in one of the cases. However, given that both param groups are using 'weight_decay': 0.01, you could also remove this filtering.