I have a seq2seq with encoder and decoder. The structure of the architecture is as follows:
# initialize encoder
class Encoder(nn.Module):
...
# initialize decoder
class Decoder(nn.Module):
...
# initialize seq2seq model
# we pass the decoder and encoder to it
class Seq2Seq(nn.Module):
...
Then, we create the network as follows:
encoder_net = Encoder(...)
decoder_net = Decoder(...)
model = Seq2Seq(encoder_net, decoder_net, ...)
In decoder network, there is a quite complicated network, let say on top of the nn.GRU()
. Thus, I want to optimize this part with one LR, and everything other that that with another LR. How can I do that?
Can I do it like this?
optim.SGD([
{'params': model.decoder_net.layer_x.parameters(), 'lr':1e-4},
{'params': model.parameters()}
], lr=1e-3, momentum=0.9)