Fine Tuning: different learning rates for different parameters

I have created custom layers nested within one another, the first of which uses an Embedding layer. I’m loading pretrained embeddings in it with freeze=False and want to train it with the rest of the model, but at a slower learning rate. When creating optimizer, I can propagate parameters of this layer through the nested layers but it’s ugly and hacky. And it is also already contained in model.parameters() which causes conflicts when providing per parameter options. Is there a cleaner way to do this? See the code below for how I’m doing it right now.

class layer1(nn.Module):
    def __init__(self, emb_size, pretrained):
        self.embedding_layer = nn.Embedding.from_pretrained(pretrained, freeze=False)
        self.fine_tune = self.embedding_layer.parameters()
        #other parameters/layers/operations

class layer2(nn.Module):
    def __init__(self, emb_size, pretrained, proj_size):
        self.layer1 = layer1(emb_size, pretrained)
        self.fine_tune = self.layer1.fine_tune
        #other parameters/layers/operations

class myModel(nn.Module):
    def __init__(self, emb_size, pretrained, proj_size, attention_dim):
        self.layer2 = layer2(emb_size, pretrained, proj_size)
        self.fine_tune = self.layer2.fine_tune
        #other parameters/layers/operations

model = myModel(emb_size, pretrained, proj_size, attention_dim)
fine_tune_id = id(list(model.fine_tune)[0])
normal_params = []
for param in model.parameters():
    if id(param) != fine_tune_id:
opt = optim.Adam([{'params':model.fine_tune,'lr':1e-4},