pytorch autograd on linear combination weights in the parameter space

I’m trying to multiply the parameters of one model (model A) by a scalar $\lambda$ to get another model (model B) which has the same architecture as A but different parameters. Then I feed a tensor into model B and get the output. I want to calculate the gradient of the output on $\lambda$ but the .backward() method doesn’t work. Specifically, I try to run the following program:

import torch
import torch.nn as nn

class MyBaseModel(nn.Module):
    def __init__(self):
        super(MyBaseModel, self).__init__()
        self.linear1 = nn.Linear(3, 8)
        self.act1 = nn.ReLU()
        self.linear2 = nn.Linear(8, 4)
        self.act2 = nn.Sigmoid()
        self.linear3 = nn.Linear(4, 5)
    def forward(self, x):
        return self.linear3(self.act2(self.linear2(self.act1(self.linear1(x)))))

class WeightedSumModel(nn.Module):
    def __init__(self):
        super(WeightedSumModel, self).__init__()
        self.lambda_ = nn.Parameter(torch.tensor(2.0))
        self.a = MyBaseModel()
        self.b = MyBaseModel()
    def forward(self, x):
        for para_b, para_a in zip(self.a.parameters(), self.b.parameters()):
            para_b.data = para_a.data * self.lambda_
        return self.b(x).sum()

input_tensor = torch.ones((2, 3))
weighted_sum_model = WeightedSumModel()
output_tensor = weighted_sum_model(input_tensor)
output_tensor.backward()

print(weighted_sum_model.lambda_.grad)

And the printed value is None.

I wonder how can I get the gradient of weighted_sum_model.lambda_ to optimize this parameter?

I tried various ways to get the parameters of weighted_sum_model.b but they all did’t work. And I visualized the computation graph of WeightedSumModel, on which there is only b but not a and lambda.

Hi @zhj2022,

The .data attribute is depcretated and shouldn’t be used, replace the code with just para_b = para_a + self.lambda_