Hi Denver!

First a brief word on mathematical terminology: A *n-sphere* is

described by n (independent) parameters and is often thought

of as being embedded in (n+1)-space.

So a 2-sphere is the (two-dimensional) surface of a three-dimensional

ball (embedded in 3-space). A 1-sphere is just the ordinary circle.

You most likely *don’t* want to implement these equations because

there are many locations where they are singular. (Also, the different

parameters are rather different in character.)

For most purposes you are much better off using n+1 slightly

redundant parameters to describe an n-sphere, namely the n+1

coordinates of a point in (n+1)-space that is constrained to be a

distance of 1 away from the origin. (This constraint describes /

eliminates the redundancy.)

Using `softmax`

here is likely to be sub-optimal, because, among

other reasons, the “geometry” of `softmax`

doesn’t really match well

with the geometry of a sphere.

If you want the output of your model to be an n-sphere, you should

have your model output n+1 unbounded real values (e.g., the

n+1 outputs of a final `Linear`

layer), and then normalize that

(n+1)-dimensional vector to have unit norm. (That extra degree of

freedom that is normalized away isn’t really a problem. Networks,

in general, contain lots of redundancy.)

I suggest directly normalizing your (n+1)-dimensional output vector,

rather than passing it through `softmax`

, but the general concept is

fully analogous.

As you say, because the norm of your (pre-normalization) vector

doesn’t enter into your loss function, there’s nothing that keeps it

from running off to infinity (or zero). But it’s easy to stabilize. Just

add a penalty like

```
stabilization_loss = (1.0 - output_norm**2)**2
```

to your total loss.

Note, you don’t care whether your `stabilization_loss`

forces your

output vector to have a norm that is exactly (or very close to) 1 – you

just want to keep the norm from running off to infinity and becoming

singular.

Best.

K. Frank