I am trying to get an autoencoder to work on my L2 normalized dataset. For now I use the MSE loss which works oke but I noticed that a lot of the reconstructed vectors are not nearly on the unit sphere, which they should be since I know every input vector is on there. Is there any simple solution to penalize or enforce the decoder to generate unit sized vectors? I tried calculating the L2 norm of each item in a batch and dividing the output by that but I am not sure this will give the correct results for the computation graph.

To answer this specific question, yes, you can add a loss term
that will push your predictions onto the unit sphere.

When the (L2) norm or your prediction is 1, the prediction is on
the unit sphere. So just add unit-sphere mismatch term:

sphere_loss = fac * ((preds**2).sum (dim = 1) - 1.0).abs().mean()
loss += sphere_loss

(You could also take the square of the mismatch, rather than abs(). That would give you a softer penalty, but I don’t see
any reason to prefer using the square.)

As a practical matter, there might be some benefit to doing this,
but it shouldn’t be necessary. As you train, your predictions will
tend to drift outward from the unit sphere (barring some other
regularization), but nothing is actively* pushing them away.

As an aside, Alex is right that normalizing your predictions is
the proper way to go (with or without the sphere_loss term).
Doing so won’t break autograd, or otherwise cause problems.

*) When you take a finite step tangent to the unti sphere, you
will move – to second order – outward off the unit sphere.