Adam various learning rates

Mehdi · February 24, 2018, 11:41am

Hello,

I’d like to specify various learning rates for differents layers in my optimizer. The thing is that my network has two separated outputs
Here’s the code:

class AC (nn.Module):

def __init__(self, env_infos):

	nn.Module.__init__(self)
	self.env_infos = env_infos
	self.p1 = nn.Linear(env_infos[0], 10)
	self.p2 = nn.Linear(10, env_infos[1])

	self.v1 = nn.Linear(env_infos[0], 10)
	self.v2 = nn.Linear(10, 1)
            
            # I first tried 
	# self.a_1 = optim.Adam([self.p1, self.p2], 5e-3)
	# self.a_2 = optim.Adam([self.v1, self.v2], 1e-2)

	self.a_1 = optim.Adam([self.p1.parameters(), self.p2.parameters()], 5e-3)
	self.a_2 = optim.Adam([self.v1.parameters(), self.v2.parameters()], 1e-2)

Since the optimizer works only with Variables, should I pass it the state_dict ?

Thanks !

jpeg729 · February 24, 2018, 1:05pm

The state_dict might contain other stuff too.

The problem is that [self.p1.parameters(), self.p2.parameters()] is a list containing two generators, not a list containing a bunch of parameters.

Replace with one of these

[*self.p1.parameters(), *self.p2.parameters()]
list(self.p1.parameters()) + list(self.p2.parameters())

tom · February 24, 2018, 3:21pm

For your use case, the Per-Parameter Options section in the pytorch documentation might have good info, too.

Best regards

Thomas

Mehdi · February 25, 2018, 2:59pm

Yop, works like this ! Thanks !