Gradient of variable that has been reshaped

ejoebstl · February 12, 2018, 8:33pm

Hello,

I’m currently trying to calculate the hessian w.r.t. the weights of a linear layer.

Since the weights are given as a matrix, I would like to re-shape it to the form of a vector. Otherwise, my gradient is also a matrix, which is inconvenient for further calculation.

As soon as I apply view to the weights however, pytorch fails to calculate the gradient, with RuntimeError: differentiated input is unreachable.

I use the following code, inspired by this discussion:

# nll is a negative-log-likelihood loss function, layer is a linear layer

# As soon as I flatten here, differentiation stops working
flat = layer.weight.view(layer.weight.numel())

# Calculate gradient
weight_grad = torch.autograd.grad(nll, flat, create_graph=True)[0]

# Calculate and concatenate hessian
hessian = [torch.autograd.grad(weight_grad[0][i], flat, create_graph=True)[0] for i in range(flat.size(1))]
hessian = torch.stack(hessian)

I assume that view is not compatible with the autograd module. What would be the correct approach here?

richard · February 12, 2018, 9:45pm

The docs say that the second argument to torch.autograd.grad should be the inputs. In this case, you should be passing in layer.weight, not layer.weight.view, because layer.weight is the original Variable and flat is Variable that has been derived from layer.weight.

Does that make sense?

ejoebstl · February 12, 2018, 9:49pm

That does make sense, especially in my specific case.

Still - why is it impossible to get the gradient of a variable created by view?

richard · February 12, 2018, 9:52pm

torch.autograd.grad(outputs, inputs) computes the gradients of outputs w.r.t. inputs.

You can compute the gradient of a variable created by view with respect to an input, if that’s what you’re looking for.

ejoebstl · February 13, 2018, 7:02pm

That makes sense.

To rephrase it: It fails because the output of the forward pass does not depend on the variable flat, but on the variable layer.weight. Therefore, pytorch cannot find the variable flat in the graph of nll.

Thanks!