Hi,

I have the following problem:

I have :

- a model M(x,p), where x is an input tensor and p the parameters of the model that returns a single float value as output.
- a set of inputs x
- a set of labels y for training
- a loss L(x,y)

I want to find the parameters p that minimize the loss L of argmin_x( M(x,p) ) and y. Basically, I want first to find the x’ that stays in a minimum of M(x,p), and I want to find the parameters that minimize L(x’,y).

what I did right now is probably very naive:

```
optimizer1 = torch.optim.AdamW(M.parameters())
for e1 in range(100):
optimizer2 = torch.optim.AdamW(x)
for e2 in range(100):
out = M(x)
out.backward()
optimizer2.step()
optimizer2.zero_grad()
lossValue = L(x,y)
lossValue.backward()
optimizer1.step()
optimizer1.zero_grad()
```

In this way the optimization performed on the parameters does not converge at all (so optimizer1). Is there a better way to solve the problem? Might it be a problem of the fact that only the gradients coming from the inner optimization are not taken into consideration? Eventually, is there a way to pass them to the optimization of the parameters?

Thanks in advance