My loss function is defined as follows: $Total loss = MSE(w, \phi) + L2norm(F) + MSE(L2norm(\phi), 1). **w** is a d-dimensional vector which I am learning. I want to optimize **w** by taking the gradient of total loss w.r.t **w**. Is there a way to do so without creating a single layer neural network?

You can just define `w`

to be a tensor that requires gradient.

You can instantiate the optimizer with `[w]`

instead of `model.parameters()`

or you could do the optimization yourself in a `torch.no_grad`

block.

