Hi Leon!
I don’t really follow your embedding scheme, and I can’t comment
on your code.
But if I understand goal correctly, perhaps you can project your
gradients onto your desired subspace after an unmodified
backward()
step.
The common optimizers use some variant of gradient descent
where params -= learning_rate * gradient
(where gradient
might have some momentum history in it, but this doesn’t change
the idea).
So just do the same update step, but replace gradient
with a
projected_gradient
that lies in your subspace. (You would
have tweak or write your own optimizer to do this.)
Alternatively, you could perform the standard update (and not have
to modify the optimizer), and then project your new parameters to
lie in your subspace. (These two approaches are equivalent*, but I
think in terms of projecting the gradient rather than the parameters
for some reason.)
*) “Project gradient” and “project (updated) parameters” are equivalent
if your subspace is understood to contain the zero vector. From your
stated initial condition, I understand this to be the case.
Best.
K. Frank