I am thinking if using multiple optimizers can change the performance of the network comparing to having one optimizer.
Say, I have Encoder-Decoder network but it has three encoders and three decoders in it.
In this case, should I define each architecture and optimize separately as below?
encoder1_optimizer = optim.Adam(encoder1.parameters(), lr=learning_rate)
encoder2_optimizer = optim.Adam(encoder2.parameters(), lr=learning_rate)
encoder3_optimizer = optim.Adam(encoder3.parameters(), lr=learning_rate)
decoder1_optimizer = optim.Adam(decoder1.parameters(), lr=learning_rate)
decoder2_optimizer = optim.Adam(decoder2.parameters(), lr=learning_rate)
decoder3_optimizer = optim.Adam(decoder3.parameters(), lr=learning_rate)
Or should I define the entire architecture in one class and use one optimizer?
AllEncoder_Decoder_optimizer = optim.Adam(AllEncoder_Decoder.parameters(), lr=learning_rate)