Here, computing gradients is an end in itself, not a means to the end of minimizing some function.
I have a vector
x whose elements must always sum to 1. I want to know how much
f(x) changes for valid infinitesimal changes in
x. That is, how much does
f vary over small changes in the regime where
x sums to 1.
.backward() and traditional gradients naively would consider infinitesimal changes along the coordinate axes in the superspace of
x+delta may not be a valid vector anymore, so this strategy cannot be used for my purpose.
How can I measure the gradient taken only in valid directions?
You have a vector,
x, that satisfies a constraint,
g (x) = 0 (where,
in your case,
g (x) = x.sum() - 1).
Your constraint defines a hypersurface in “
x” space (in your specific
case, a hyperplane), and you only want to consider infinitesimal
x that lie in this hypersurface.
The gradient of your constraint function,
g, is perpendicular to your
constraint hypersurface, so you want to subtract off (“project away”)
any component of your gradient of
f that is perpendicular to the
constraint surface (that is, any component that is parallel to the
normal (the vector perpendicular) to your constraint surface).
So you could use pytorch’s autograd and
backward() to separately
grad_g. Then the constrained gradient you
grad_f_constrained = grad_f - ((grad_f.dot (grad_g) / (grad_g**2).sum()) * grad_g
This makes sense, I’ll circle back if I have questions during the implementation. Thanks!