Adam various learning rates


I’d like to specify various learning rates for differents layers in my optimizer. The thing is that my network has two separated outputs
Here’s the code:

class AC (nn.Module):

def __init__(self, env_infos):

	self.env_infos = env_infos
	self.p1 = nn.Linear(env_infos[0], 10)
	self.p2 = nn.Linear(10, env_infos[1])

	self.v1 = nn.Linear(env_infos[0], 10)
	self.v2 = nn.Linear(10, 1)
            # I first tried 
	# self.a_1 = optim.Adam([self.p1, self.p2], 5e-3)
	# self.a_2 = optim.Adam([self.v1, self.v2], 1e-2)

	self.a_1 = optim.Adam([self.p1.parameters(), self.p2.parameters()], 5e-3)
	self.a_2 = optim.Adam([self.v1.parameters(), self.v2.parameters()], 1e-2)

Since the optimizer works only with Variables, should I pass it the state_dict ?

The state_dict might contain other stuff too.

The problem is that [self.p1.parameters(), self.p2.parameters()] is a list containing two generators, not a list containing a bunch of parameters.

Replace with one of these

  • [*self.p1.parameters(), *self.p2.parameters()]
  • list(self.p1.parameters()) + list(self.p2.parameters())
For your use case, the Per-Parameter Options section in the pytorch documentation might have good info, too.

works like this !