My loss function is defined as follows: $Total loss = MSE(w, \phi) + L2norm(F) + MSE(L2norm(\phi), 1). w is a d-dimensional vector which I am learning. I want to optimize w by taking the gradient of total loss w.r.t w. Is there a way to do so without creating a single layer neural network?
You can just define
w to be a tensor that requires gradient.
You can instantiate the optimizer with
[w] instead of
model.parameters() or you could do the optimization yourself in a