Taking gradient w.r.t multi loss function for a single variable

My loss function is defined as follows: $Total loss = MSE(w, \phi) + L2norm(F) + MSE(L2norm(\phi), 1). w is a d-dimensional vector which I am learning. I want to optimize w by taking the gradient of total loss w.r.t w. Is there a way to do so without creating a single layer neural network?

You can just define w to be a tensor that requires gradient.
You can instantiate the optimizer with [w] instead of model.parameters() or you could do the optimization yourself in a torch.no_grad block.

Best regards