Help with Projecting Gradients onto a Hypersphere's Tangent Plane

Hi everyone,

I am training a neural network with a loss function denoted as L. I aim to constrain the network’s outputs (z) such that they lie on a hypersphere of dimension d.
To maintain a valid training framework, I need the backpropagated gradients to respect this constraint, meaning they should be tangent to the hypersphere at the corresponding points.
To achieve this, I plan to project the gradients ∂L/∂z, where z is the network’s output, onto the tangent plane of the hypersphere. Based on my understanding, this projection can be computed as:

g_tangent=g−(g⋅z_normalized) z_normalized,

where g=∂L/∂z and z_normalized is the normalized version of z.

However, I’m having difficulty accessing these gradients, modifying them, and ensuring the loss is properly backpropagated after applying this projection.

I’d greatly appreciate any feedback, suggestions, or guidance on implementing this process. Thank you so much for your help!

Registering a backward hook would allow you to access gradients during backward and modify them. torch.Tensor.register_hook — PyTorch 2.5 documentation

Does this sound useful for your case?

1 Like

Hi, thank you for your answer!
Just looking into this function, and it looks like it could work :slight_smile:

Hi jumdc!

I’m not sure I follow what you are asking, but if I understand your use case, it should
suffice to project your z onto the hypersphere and then compute your loss function
using the values of the projected z. The gradients with respect to the unprojected z
will then naturally be tangent to the hypersphere (because the objective function will
now be independent of any changes in z normal to the hypersphere, as any such
changes will have been projected away).

Here is an illustrative script:

import torch
print (torch.__version__)

_ = torch.manual_seed (2024)

t = torch.tensor ([2.0, 2.0])                            # two-dimensional target point
x = torch.randn (5, 2, requires_grad = True)             # batch of five two-dimensional starting points

print ('x ...')
print (x)

y = x / (x**2).sum (dim = 1, keepdim = True).sqrt()      # project x onto unit circle

print ('y = x projected onto unit circle ...')
print (y)

torch.nn.MSELoss (reduction = 'sum') (x, t).backward()

print ('x.grad (no projection) ...')
print (x.grad)
print ('x.grad <dot> x ...')
print ((x * x.grad).sum (dim = 1))                       # x.grad not tangent to unit circle

x.grad = None

torch.nn.MSELoss (reduction = 'sum') (y, t).backward()

print ('x.grad (with projection) ...')
print (x.grad)
print ('x.grad <dot> x ...')
print ((x * x.grad).sum (dim = 1))                       # this version of x.grad is tangent to unit circle

And here is its output:

2.5.1
x ...
tensor([[-0.0404,  1.7260],
        [-0.8140,  1.3722],
        [ 0.5060, -0.4823],
        [-0.7853,  0.6681],
        [-0.4439,  0.1888]], requires_grad=True)
y = x projected onto unit circle ...
tensor([[-0.0234,  0.9997],
        [-0.5102,  0.8600],
        [ 0.7238, -0.6900],
        [-0.7616,  0.6480],
        [-0.9202,  0.3914]], grad_fn=<DivBackward0>)
x.grad (no projection) ...
tensor([[-4.0809, -0.5480],
        [-5.6281, -1.2557],
        [-2.9881, -4.9647],
        [-5.5705, -2.6638],
        [-4.8879, -3.6223]])
x.grad <dot> x ...
tensor([-0.7809,  2.8585,  0.8828,  2.5946,  1.4859], grad_fn=<SumBackward1>)
x.grad (with projection) ...
tensor([[ -2.3698,  -0.0555],
        [ -2.9546,  -1.7528],
        [ -5.5822,  -5.8556],
        [ -3.5439,  -4.1655],
        [ -4.2567, -10.0074]])
x.grad <dot> x ...
tensor([-7.4506e-09,  2.3842e-07,  0.0000e+00,  4.7684e-07, -5.9605e-07],
       grad_fn=<SumBackward1>)

Best.

K. Frank

1 Like

Hi Frank,

Thanks a lot for your detailed answer and insights.

Best,
Julie