I am trying to train a model to output the correct angle. I am currently restricting my outputs to [0, 2pi]:

2 * torch.pi * torch.sigmoid(logits)

My question is: What if the prediction is something like 1.9pi, but the target is 0.1pi. The loss should in theory be something like 0.2pi but in my current situation it will be 1.8pi.

I can not have a modular loss function since the angle is not the actual end product of my targets, but is rather an intermediate step towards computing the final output. Can a model instead output a modular value? I am afraid that performing logits % (2 * torch.pi) will lead to logits spinning out of control.

Rather than outputting a single value that is your angle, I think it makes
more geometric sense to output two values, x and y, on the ray that
defines your angle. Then regulate your x and y by pushing them
towards (but not constraining them to lie on) the unit circle with a term
like loss_reg = (1.0 - (x**2 + y**2))**2.

You do now have the redundancy of using two values to represent a
single angle, but geometrically this representation is â€śunbiased,â€ť so
I think this outweighs any disadvantage of the redundancy.

If, at some point, you need an actual angle, you can use theta = torch.atan2 (y, x), but I would avoid converting x and y to an angle until you actually need to.