Hi @soulitzer,
I’m a little bit confused about how I can get these values. I wrote the following script yesterday to test what you stated above but my attempts throws an error and I’m not 100% sure as to why it fails.
import torch
import torch.nn as nn
class network(nn.Module):
def __init__(self):
super(network, self).__init__()
self.fc1 = nn.Linear(2, 32, bias=True)
self.fc2 = nn.Linear(32,32, bias=True)
self.fc3 = nn.Linear(32, 2, bias=True)
def forward(self, x):
self.x1 = self.fc1(x)
self.x2 = self.fc2(self.x1)
self.x3 = self.fc3(self.x2)
return self.x3
net = network()
x = torch.randn(4096, 2)
y = net(x)
loss = y.sum(dim=-1).mean()
xis = [net.x1, net.x2, net.x3]
for xi in xis:
grad, = torch.autograd.grad(loss, xi, torch.ones_like(loss), retain_graph=True, create_graph=True)
for xj in xis:
gradgrad, = torch.autograd.grad(grad, xj, torch.ones_like(grad), allow_unused=True)
The error message is,
Traceback (most recent call last):
File "print_hessian.py", line 32, in <module>
gradgrad, = torch.autograd.grad(grad, xj, torch.ones_like(grad), allow_unused=True)
File "~/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
return Variable._execution_engine.run_backward(
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Now I’d assume this is the case because each of the xis
don’t have a grad_fn
? Is that correct?
What might be easier to tell you exactly why I wanted the grad_output
terms differientated. I’m trying to get the Hessian of my loss with respect to all parameters for all input samples. So, in effect have a Tensor of size [B,N,N]
where B
's the batch size and N
are the number of parameters within the network. (Although, this could be reduced to do it within given pairs of layers so N
being Ni
+ Nj
for the i
-th and j
-th layer.)
Thank you for your help!