I’m currently trying to calculate the hessian w.r.t. the weights of a linear layer.
Since the weights are given as a matrix, I would like to re-shape it to the form of a vector. Otherwise, my gradient is also a matrix, which is inconvenient for further calculation.
As soon as I apply
view to the weights however, pytorch fails to calculate the gradient, with
RuntimeError: differentiated input is unreachable.
I use the following code, inspired by this discussion:
# nll is a negative-log-likelihood loss function, layer is a linear layer # As soon as I flatten here, differentiation stops working flat = layer.weight.view(layer.weight.numel()) # Calculate gradient weight_grad = torch.autograd.grad(nll, flat, create_graph=True) # Calculate and concatenate hessian hessian = [torch.autograd.grad(weight_grad[i], flat, create_graph=True) for i in range(flat.size(1))] hessian = torch.stack(hessian)
I assume that
view is not compatible with the
autograd module. What would be the correct approach here?