How to put different groups of trainable parameters into optimizer in Pytorch

xiong_xiong · June 8, 2022, 3:06pm

My problem is how to train different groups of parameters with an optimizer in PyTorch.
I have a basic fully-connected neural network with trainable weights and biases, and I have two special trainable parameters(lambda_1 and lambda_2), then I can get two groups of parameters. Group one includes weights and biases, group two includes
weights, biases, and lambda_1 and lambda_2. My goal is to put these two groups of parameters into an optimizer, respectively. I tried to implement it but failed. Here is My code.

print("group one: weight, biase")
for p in model.parameter_wb():
  print(p)

print("\n")
print("group two: weight, biase and lambda")
for p in model.parameter_wb_lambda():
  print(p)

Why model.parameter_wb() and model.parameter_wb_lambda() produce the same result(both of them produce parameters of group two. I hope to get two different groups of trainable parameters.

import torch
import torch.nn as nn
import torch.optim as optim

from collections import OrderedDict

class FNN(torch.nn.Module):
  def __init__(self, layers):
    super(FNN, self).__init__()
    
    # parameters
    self.depth = len(layers) - 1
    
    # set up layer order dict
    #  torch.nn.Tanhshrink/torch.nn.Tanh
    # torch.nn.functional.tanh
    self.activation = torch.nn.Tanh
    
    layer_list = list()
    for i in range(self.depth - 1): 
        layer_list.append(
            ('layer_%d' % i, torch.nn.Linear(layers[i], layers[i+1]))
        )
        layer_list.append(('activation_%d' % i, self.activation()))
        
    layer_list.append(
        ('layer_%d' % (self.depth - 1), torch.nn.Linear(layers[-2], layers[-1]))
    )
    layerDict = OrderedDict(layer_list)
    
    # deploy layers
    self.layers = torch.nn.Sequential(layerDict)
    
  def forward(self, x):
      out = self.layers(x)
      return out

class Model():
  def __init__(self,  layers, nn):   
    self.nn = nn
    # deep neural networks
    self.layers = layers
    
    if self.nn == "FNN":
      self.multi_task_model = FNN(self.layers)

    # group 1: weight and biase
    self.parameter_wb = self.multi_task_model.parameters

    # another two trainable parameters
    self.lambda_1 = torch.tensor([0.0], requires_grad=True)
    self.lambda_2 = torch.tensor([-6.0], requires_grad=True)
    
    self.lambda_1 = torch.nn.Parameter(self.lambda_1)
    self.lambda_2 = torch.nn.Parameter(self.lambda_2)
    
    # register parameter
    self.multi_task_model.register_parameter('lambda_1', self.lambda_1)
    self.multi_task_model.register_parameter('lambda_2', self.lambda_2)

    # group 2: weight, biase and lambda
    self.parameter_wb_lambda = self.multi_task_model.parameters

layers = [1,3,3,2]

nn = "FNN"
model = Model(layers, nn)

print("print trainable parameter: weight, biase")
for p in model.parameter_wb():
  print(p)

print("\n")
print("print trainable parameter: weight, biase and lambda")
for p in model.parameter_wb_lambda():
  print(p)

KFrank · June 9, 2022, 4:09am

Hi Xiong!

xiong_xiong:

Why model.parameter_wb() and model.parameter_wb_lambda() produce the same result

    # group 1: weight and biase
    self.parameter_wb = self.multi_task_model.parameters
...
    # register parameter
    self.multi_task_model.register_parameter('lambda_1', self.lambda_1)
    self.multi_task_model.register_parameter('lambda_2', self.lambda_2)

    # group 2: weight, biase and lambda
    self.parameter_wb_lambda = self.multi_task_model.parameters

Your problem is that:

    self.parameter_wb = self.multi_task_model.parameters

assigns the parameters method of your multi_task_model to
parameter_wb, but doesn’t evaluate that method.

Then:

    self.parameter_wb_lambda = self.multi_task_model.parameters

assigns that same method to parameter_wb_lambda, again without
evaluating it. It doesn’t matter that you registered two more Parameters
in between the two assignments.

It’s only when you later call:

for p in model.parameter_wb():
# and
for p in model.parameter_wb_lambda():

that the parameters method is actually evaluated and it returns, both
times, the (generator for the) list of all of the Parameters that have been
registered at the time of evaluation.

Best.

K. Frank

xiong_xiong · June 9, 2022, 4:27am

Hi, Frank, thank you so much for your nice answer. I understand the reason for the same result produced by

for p in model.parameter_wb():
# and
for p in model.parameter_wb_lambda():

I hope to pass different groups of parameters into an optimizer. I hope that self.parameter_wb() will produce weights and biases, and self.parameter_wb_var() will produce weights, biases, and lambda_1 and lambda_2. Then I can get two different optimizers. I have no idea how to implement this. Could you please give some advice?
Thank you so much anyway.

self.optimizer_1 = optim.Adam(self.parameter_wb(), lr=0.001)
self.optimizer_2 = optim.Adam(self.parameter_wb_var(), lr=0.001)