Batch-differentiating with respect to single parameter

Hi Evan!

You can’t eliminate the loop using backward-mode autograd. Autograd
will let you compute the derivatives of a single scalar (e.g., a loss) with
respect to a batch of variables (i.e., compute the gradient) in a single
pass, but it won’t compute the derivatives of a batch of results with
respect to a single variable in a single pass.

Yes (but with the proviso that forward-mode autograd is still in beta /
experimental).

I have not used forward-mode autograd for anything real so I can’t
speak to its stability (nor performance), but here is an illustration applied
to your toy example:

>>> import torch
>>> print (torch.__version__)
1.12.0
>>>
>>> x = torch.arange (10)
>>> C = torch.tensor (-5., requires_grad=True)
>>> y = C * x
>>>
>>> # compute "batch-derivative" with loop over backward-mode autograd
>>> resultA = torch.tensor (tuple (torch.autograd.grad (yi, C, retain_graph=True)[0] for yi in y))
>>> resultA
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>>
>>> C.requires_grad = False
>>>
>>> # compute "batch-derivative" with one pass of forward-mode autograd
>>> Ct = torch.tensor (1.0)
>>> with torch.autograd.forward_ad.dual_level():
...     C_dual = torch.autograd.forward_ad.make_dual (C, Ct)
...     y_dual = C_dual * x
...     resultB = torch.autograd.forward_ad.unpack_dual (y_dual).tangent
...
>>> resultB
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> torch.equal (resultB, resultA)
True

Best.

K. Frank