RuntimeError: element 0 of tensors does not require grad when trying to recreate TensorFlow function with torch.autograd.grad

So, I am trying to recreate this TensorFlow function in PyTorch, but it doesn’t seem to be working:


d_previous = tf.placeholder("float32")
d_logit = fwd_gradients(T("softmax2_pre_activation"), T(layer_name), d_previous)[0]


def fwd_gradients(ys, xs, d_xs):
  
  """Forward-mode pushforward analogous to the pullback defined by tf.gradients.
  With tf.gradients, grad_ys is the vector being pulled back, and here d_xs is
  the vector being pushed forward.
  
  By mattjj@google.com from
  https://github.com/renmengye/tensorflow-forward-ad/issues/2
  """

  v = tf.zeros_like(ys)
  g = tf.gradients(ys, xs, grad_ys=v)
  return tf.gradients(g, v, grad_ys=d_xs)

And my attempt at recreating the function in PyTorch:

def fwd_gradients(ys, xs):
    v = torch.zeros_like(ys)
    g = torch.autograd.grad(
            outputs=[ys],
            inputs=xs,
            grad_outputs=[v],
            retain_graph=True,
            
        )[0]
    out = torch.autograd.grad(
            outputs=[g],
            inputs=v,
            grad_outputs=torch.zeros_like(xs),
            retain_graph=True,
        )[0]

Currently the above PyTorch code results in the following error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

How do I get the function working correctly?

Hi,

You need to give create_graph=True to the first call to grad if you plan on doing a backprop through it.

Also I don’t think you want to give torch.zeros_like(xs) for the second grad. It should be the d_xs that is in the tensorflow version

Thank you for the help! I guess I don’t need retain_graph to be true as well?

The function seems to work now!

def fwd_gradients(ys, xs, d_xs):
    v = torch.zeros_like(ys)
    v.requires_grad = True
    g = torch.autograd.grad(
            outputs=[ys],
            inputs=xs,
            grad_outputs=[v],
            create_graph=True,        
        )[0]

    out = torch.autograd.grad(
            outputs=[g],
            inputs=v,
            grad_outputs=d_xs,
        )[0]

Also, as PyTorch doesn’t use placeholders d_xs can probably be changed so that it’s just a ones tensor with the correct size.

If your function has a single input, you can give it a single one indeed to get the full gradients.
Otherwise, you want to give the grad input (d_xs) that match what you want to compute.