Gradient of variable that has been reshaped


I’m currently trying to calculate the hessian w.r.t. the weights of a linear layer.

Since the weights are given as a matrix, I would like to re-shape it to the form of a vector. Otherwise, my gradient is also a matrix, which is inconvenient for further calculation.

As soon as I apply view to the weights however, pytorch fails to calculate the gradient, with RuntimeError: differentiated input is unreachable.

I use the following code, inspired by this discussion:

# nll is a negative-log-likelihood loss function, layer is a linear layer

# As soon as I flatten here, differentiation stops working
flat = layer.weight.view(layer.weight.numel())

# Calculate gradient
weight_grad = torch.autograd.grad(nll, flat, create_graph=True)[0]

# Calculate and concatenate hessian
hessian = [torch.autograd.grad(weight_grad[0][i], flat, create_graph=True)[0] for i in range(flat.size(1))]
hessian = torch.stack(hessian)

I assume that view is not compatible with the autograd module. What would be the correct approach here?

The docs say that the second argument to torch.autograd.grad should be the inputs. In this case, you should be passing in layer.weight, not layer.weight.view, because layer.weight is the original Variable and flat is Variable that has been derived from layer.weight.

Does that make sense?

1 Like

That does make sense, especially in my specific case.

Still - why is it impossible to get the gradient of a variable created by view?

torch.autograd.grad(outputs, inputs) computes the gradients of outputs w.r.t. inputs.

You can compute the gradient of a variable created by view with respect to an input, if that’s what you’re looking for.

That makes sense.

To rephrase it: It fails because the output of the forward pass does not depend on the variable flat, but on the variable layer.weight. Therefore, pytorch cannot find the variable flat in the graph of nll.