I’m currently trying to calculate the hessian w.r.t. the weights of a linear layer.
Since the weights are given as a matrix, I would like to re-shape it to the form of a vector. Otherwise, my gradient is also a matrix, which is inconvenient for further calculation.
As soon as I apply view to the weights however, pytorch fails to calculate the gradient, with RuntimeError: differentiated input is unreachable.
# nll is a negative-log-likelihood loss function, layer is a linear layer
# As soon as I flatten here, differentiation stops working
flat = layer.weight.view(layer.weight.numel())
# Calculate gradient
weight_grad = torch.autograd.grad(nll, flat, create_graph=True)[0]
# Calculate and concatenate hessian
hessian = [torch.autograd.grad(weight_grad[0][i], flat, create_graph=True)[0] for i in range(flat.size(1))]
hessian = torch.stack(hessian)
I assume that view is not compatible with the autograd module. What would be the correct approach here?
The docs say that the second argument to torch.autograd.grad should be the inputs. In this case, you should be passing in layer.weight, not layer.weight.view, because layer.weight is the original Variable and flat is Variable that has been derived from layer.weight.
To rephrase it: It fails because the output of the forward pass does not depend on the variable flat, but on the variable layer.weight. Therefore, pytorch cannot find the variable flat in the graph of nll.