I want to be able to take the gradient of the norm-squared gradient of the loss function of a neural network. That’s a bit of a mouthful: if theta are the parameters of a neural net (unrolled into a vector), and L is the loss function, then let g be the gradient of L with respect to theta. Letting ||g||^2 be the norm-squared of the gradient, I would like to take the gradient of this with respect to theta. (This is related to the question of computing the Hessian vector product).
Here’s what I tried:
linear = nn.Linear(10, 20)
x = torch.randn(1, 10)
L = linear(x).sum()
grad = torch.autograd.grad(L, linear.parameters(), create_graph=True)
z = grad @ grad
The problem this runs into is that grad is a tuple of tensors, and not a single unrolled tensor. Every way I tried of converting the tuple grad into an unrolled vector ends up breaking the graph, so that z.backwards() either returns an error or None.