My loss function is defined as follows: $Total loss = MSE(w, \phi) + L2norm(F) + MSE(L2norm(\phi), 1). w is a d-dimensional vector which I am learning. I want to optimize w by taking the gradient of total loss w.r.t w. Is there a way to do so without creating a single layer neural network?
You can just define w
to be a tensor that requires gradient.
You can instantiate the optimizer with [w]
instead of model.parameters()
or you could do the optimization yourself in a torch.no_grad
block.
Best regards
Thomas