I have the following problem:
I have :
- a model M(x,p), where x is an input tensor and p the parameters of the model that returns a single float value as output.
- a set of inputs x
- a set of labels y for training
- a loss L(x,y)
I want to find the parameters p that minimize the loss L of argmin_x( M(x,p) ) and y. Basically, I want first to find the x’ that stays in a minimum of M(x,p), and I want to find the parameters that minimize L(x’,y).
what I did right now is probably very naive:
optimizer1 = torch.optim.AdamW(M.parameters()) for e1 in range(100): optimizer2 = torch.optim.AdamW(x) for e2 in range(100): out = M(x) out.backward() optimizer2.step() optimizer2.zero_grad() lossValue = L(x,y) lossValue.backward() optimizer1.step() optimizer1.zero_grad()
In this way the optimization performed on the parameters does not converge at all (so optimizer1). Is there a better way to solve the problem? Might it be a problem of the fact that only the gradients coming from the inner optimization are not taken into consideration? Eventually, is there a way to pass them to the optimization of the parameters?
Thanks in advance