Here, computing gradients is an end in itself, not a means to the end of minimizing some function.

I have a vector x whose elements must always sum to 1. I want to know how much f(x) changes for valid infinitesimal changes in x. That is, how much does f vary over small changes in the regime where x sums to 1.

Using .backward() and traditional gradients naively would consider infinitesimal changes along the coordinate axes in the superspace of x. x+delta may not be a valid vector anymore, so this strategy cannot be used for my purpose.

So:

How can I measure the gradient taken only in valid directions?

You have a vector, x, that satisfies a constraint, g (x) = 0 (where,
in your case, g (x) = x.sum() - 1).

Your constraint defines a hypersurface in â€śxâ€ť space (in your specific
case, a hyperplane), and you only want to consider infinitesimal
changes to x that lie in this hypersurface.

The gradient of your constraint function, g, is perpendicular to your
constraint hypersurface, so you want to subtract off (â€śproject awayâ€ť)
any component of your gradient of f that is perpendicular to the
constraint surface (that is, any component that is parallel to the
normal (the vector perpendicular) to your constraint surface).

So you could use pytorchâ€™s autograd and backward() to separately
compute grad_f and grad_g. Then the constrained gradient you
want is: