Optimization changing the gradient variables

I have the following problem:

I have :

  1. a model M(x,p), where x is an input tensor and p the parameters of the model that returns a single float value as output.
  2. a set of inputs x
  3. a set of labels y for training
  4. a loss L(x,y)

I want to find the parameters p that minimize the loss L of argmin_x( M(x,p) ) and y. Basically, I want first to find the x’ that stays in a minimum of M(x,p), and I want to find the parameters that minimize L(x’,y).

what I did right now is probably very naive:

optimizer1 = torch.optim.AdamW(M.parameters())
for e1 in range(100):
     optimizer2 = torch.optim.AdamW(x)
     for e2 in range(100):
            out = M(x)
    lossValue = L(x,y)

In this way the optimization performed on the parameters does not converge at all (so optimizer1). Is there a better way to solve the problem? Might it be a problem of the fact that only the gradients coming from the inner optimization are not taken into consideration? Eventually, is there a way to pass them to the optimization of the parameters?

Thanks in advance


I am not very familiar with this kind of tasks and why they might not converge. But GitHub - facebookresearch/higher: higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps. seems very close to what you’re trying to do no?