I am not able to understand the basics why the model.parameter is not given as parameter to the optimizer. And how a new parameter optimizer_grouped_parameters is created ?
text_model=AADP()
text_model.to(device)
criterion = nn.BCELoss()
no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
{'params': [p for n, p in text_model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
{'params': [p for n, p in text_model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.01}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=3e-5)