Different learning rates for Adam gives Precision and Recall 0

Hi there!
I am fairly new to Pytorch and I am trying to provide a different learning rate for the parameters from BERT, and the rest of the model’s parameters to have the same lr.

My model class looks like this (it’s from a tutorial):

class BERTGRUModel(nn.Module):
    def __init__(self,
        self.bert = bert
        embedding_dim = bert.config.to_dict()['hidden_size']
        self.rnn = nn.GRU(embedding_dim,
                          num_layers = n_layers,
                          bidirectional = bidirectional,
                          batch_first = True,
                          dropout = 0 if n_layers < 2 else dropout)
        self.dropout = nn.Dropout(dropout)
        self.out = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)

And I am trying to give the different learning rates like this:

optimizer = optim.Adam([{'params': model.rnn.parameters(), 'lr': 0.001},{'params': model.out.parameters(), 'lr': 0.001}, {'params': model.dropout.parameters(), 'lr': 0.001}, {'params': model.bert.parameters(), 'lr': 1e-5}])

But this performs worse than simply giving:

optimizer = optim.Adam(model.parameters())

It gives a worse accuracy and the precision and recall drop down to 0 (for each batch).

I’m sure I am doing something wrong, I just can’t figure out what.

The dropout layer doesn’t have trainable parameters, but besides that the code looks alright.
Since your model is training with the “standard” approach, could you use your per-parameter optimizer with the same learning rate for all parameters, as you’ve used in the working approach?
Assuming your model is training fine afterwards, the currently specified learning rates might be bad for your new approach.

You were right, thank you! It was the value I gave to the learning rate.