Convert tuple to tensor without breaking graph

gshartnett · February 7, 2019, 3:43am

I want to be able to take the gradient of the norm-squared gradient of the loss function of a neural network. That’s a bit of a mouthful: if theta are the parameters of a neural net (unrolled into a vector), and L is the loss function, then let g be the gradient of L with respect to theta. Letting ||g||^2 be the norm-squared of the gradient, I would like to take the gradient of this with respect to theta. (This is related to the question of computing the Hessian vector product).

Here’s what I tried:

linear = nn.Linear(10, 20)
x = torch.randn(1, 10)
L = linear(x).sum()
grad = torch.autograd.grad(L, linear.parameters(), create_graph=True)
z = grad @ grad
z.backward()

The problem this runs into is that grad is a tuple of tensors, and not a single unrolled tensor. Every way I tried of converting the tuple grad into an unrolled vector ends up breaking the graph, so that z.backwards() either returns an error or None.

albanD · February 7, 2019, 9:59am

Hi,

The simplest way would be to do each of them one by one:

z = 0
for g in grad:
    z = z + g.pow(2).sum()

gshartnett · February 7, 2019, 4:56pm

Replacing

z = grad @ grad

with

z = 0
for g in grad:
z = z + g.pow(2).sum()

Actually returns an error,

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

albanD · February 7, 2019, 5:00pm

Does the Tensors in grad require gradients?

gshartnett · February 7, 2019, 5:12pm

I think this is the issue - linear.parameters() returns a tuple of tensors, none of which require gradients. If I had defined the linear neural net by hand, rather than using the module, then I could declare that the weight tensor and bias vector both required gradients and this should work.

Since I want to use a deep neural network with all sorts of complicated layers, I would like to avoid needing to define it by hand in this way.

albanD · February 7, 2019, 5:14pm

Parameters of a net all require gradients by default (that why they gradients are computed and you can train them). Unless you set them to False, they should be True.

gshartnett · February 7, 2019, 5:20pm

Ok thanks… I’m still confused about what’s going wrong then. Here’s a minimal example:

import torch
linear = torch.nn.Linear(10, 20)
x = torch.randn(1, 10)
L = linear(x).sum()
grad = torch.autograd.grad(L, linear.parameters(), create_graph=True)
gnorm = 0
for g in grad:
gnorm = gnorm + g.pow(2).sum()
gnorm.backward()

I could replace the last line with

grad2 = torch.autograd.grad(gnorm, linear.parameters(), create_graph=True)

and get the same error

albanD · February 7, 2019, 5:30pm

In that case, this is because your model is linear. And so it’s gradient is independant of it’s weights and so the gradients don’t require gradients because they never involved anything that requires gradients.
If your function is not linear like: L = linear(x).sum()**2 then it works as expected.

gshartnett · February 7, 2019, 5:55pm

Ahhh… Great, thanks so much! This really helped!