Optimize the derivatives of the output of the network

Suppose I have a network f from (x1, x2) ----> f(x1, x2). I want to minimize | df/dx1+2* df/dx2 |, that is, I want to minimize the sum of the derivatives of the output of the network. Does anyone have some idea about this? Thank you.
For example, my network looks like this:

class test(nn.Module):
    def __init__(self):
        bo_b = False
        self.l1 = nn.Linear(2, 1, bias = bo_b)
    def forward(self, state):
        v = self.l1(state)
        return v
tt1 = test()
tt2 = torch.tensor([1, 2])
tt3 = tt1.forward(tt2)

How can I create the criterion ( d tt3/ d x1 + 2*d tt3/ d x2) to minimize? Thanks


If you have a that f, x1 and x2 defined (and requiring gradients), you can do:

out = f(x1, x2)
df_dx1, df_dx2 = autograd.grad(out, (x1, x2), create_graph=True)

loss = some_criterion(df_dx1 + 2 * df_dx2)

Note that out has to be a single value for this to work, otherwise getting the full matrix of all the gradients will be much more expensive.

Hi, Alban,

Thanks for your reply. My current f is the network; how can I specify x1 and x2?
I tried in this way, however, I get “One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”

x1 = torch.tensor(1.0, requires_grad = True)
x2 = torch.tensor(2.0, requires_grad = True)
tt2 = torch.tensor([x1, x2])
tt3 = tt1.forward(tt2)
torch.autograd.grad(tt3, (x1, x2), create_graph = True)


You’re getting a warning when you do torch.tensor([x1, x2]) no?
The is creating a new Tensor based on x1 and x2 but not in a differentiable way. So that’s why x1, x2 have not been used, torch.stack([x1, x2]) should work fine

Hi, Alban,

Thanks for your reply; there is no warning when I do torch.tensor([x1, x2]) ; but torch.stack([x1, x2]) works. But do you have some idea if I have a batch of data, say, [[1, 2], [3, 4] ], how can I do this? Thanks.


You can just pass to autograd.grad whatever you had as input to your function.
I mentioned stack here because you were using something like it to generate tt2 but you don’t have to use it.

Hi, Alban @albanD

Thanks for your reply; I still have some problem if the input is a batch. I have the following code:

tt1 = test()
x1 = torch.tensor([[1.0], [2.0]], requires_grad = True)
x2 = torch.tensor([[2.0], [3.0]], requires_grad = True)
tt2 = torch.cat([x1, x2], dim = -1)

tt3 = tt1.forward(tt2)

d1, d2 = torch.autograd.grad(tt3, (x1, x2), create_graph = True)

But I got error message:
grad can be implicitly created only for scalar outputs

How can I solve this issue?


As the error mentions, your output is not a scalar (tt3). So you cannot get the gradients with just a single call to autograd.grad. You need to either compute a scalar loss for it or provide a grad_outputs depending on what you want to do for your application.