Logits for only certain outputs creates infs

I have a network with a linear head followed by a sigmoid activation function. Most of my output neurons should be in a range 0-1 and thus sigmoid makes sense, but some of the neurons should output normalized (standardized) colors.

So for the last three neurons I do torch.logit(x) as they can be in the range [-1.3, 1.4]. I still encounter errors sometimes because some of the color neurons output 1.0, and torch.logit(1.0) gives inf.

Should I use something else other than torch.logit() after my activation functions for these special neurons? Maybe normalize (not standardize!) such that the range [0, 1] becomes [-1.3, 1.4]?

Hi Zimo!

Instead of applying sigmoid() to all of your neurons and then applying
logit() to your color neurons to undo the sigmoid(), apply sigmoid()
only to your non-color neurons so that the color neurons never get
transformed.

(Applying sigmoid() to a large-enough (non-inf) value will map it
to 1.0, causing some loss of information, and you can’t undo the
numerical “damage” by applying logit(), even though mathematically
logit() is the inverse function of sigmoid().)

As an aside, you should consider whether you really want to apply
sigmoid() to any of your neurons. What subsequent processing
do you do to your non-color neurons, and could you perform that
processing in “log space,” that is, working with the logits that are
output by your Linear head? Doing so might prove more numerically
stable.

Best.

K. Frank

Thanks KFrank, great answer!

A lot of the other neurons correspond to 3d transformations of objects. I will describe down below how I currently re-scale the outputs:

  • XYZ position in a restricted 1x1x1 box. Constrained by sigmoid.

  • XYZ rotation. Currently constrained by sigmod, then normalized to [0, 2*pi].

  • Width, height, depth. Constrained by sigmoid + epsilon since I desire a smallest possible size and 1x1x1 is the largest possible size in the restricted box.

This is still a project in its early stage, so I might add some more outputs. Maybe ones that are restricted to something like [-5, 200]. How should I rescale logits to conform to these values? Maybe I shouldn’t and instead try to get the model to do that fitting for me?

The problem has been that 3D objects that fall outside of the 1x1x1 space (as in XYZ position) or objects that get a width/height of 0 get an inf/nan gradient.