I’m trying to multiply the parameters of one model (model A) by a scalar $\lambda$ to get another model (model B) which has the same architecture as A but different parameters. Then I feed a tensor into model B and get the output. I want to calculate the gradient of the output on $\lambda$ but the .backward()
method doesn’t work. Specifically, I try to run the following program:
import torch
import torch.nn as nn
class MyBaseModel(nn.Module):
def __init__(self):
super(MyBaseModel, self).__init__()
self.linear1 = nn.Linear(3, 8)
self.act1 = nn.ReLU()
self.linear2 = nn.Linear(8, 4)
self.act2 = nn.Sigmoid()
self.linear3 = nn.Linear(4, 5)
def forward(self, x):
return self.linear3(self.act2(self.linear2(self.act1(self.linear1(x)))))
class WeightedSumModel(nn.Module):
def __init__(self):
super(WeightedSumModel, self).__init__()
self.lambda_ = nn.Parameter(torch.tensor(2.0))
self.a = MyBaseModel()
self.b = MyBaseModel()
def forward(self, x):
for para_b, para_a in zip(self.a.parameters(), self.b.parameters()):
para_b.data = para_a.data * self.lambda_
return self.b(x).sum()
input_tensor = torch.ones((2, 3))
weighted_sum_model = WeightedSumModel()
output_tensor = weighted_sum_model(input_tensor)
output_tensor.backward()
print(weighted_sum_model.lambda_.grad)
And the printed value is None.
I wonder how can I get the gradient of weighted_sum_model.lambda_ to optimize this parameter?
I tried various ways to get the parameters of weighted_sum_model.b but they all did’t work. And I visualized the computation graph of WeightedSumModel, on which there is only b but not a and lambda.