Normalizing the output could work e.g. if your target has specific bounds.
E.g. in the past I’ve worked on a keypoint detection use case where the target coordinates were in the range [0, 96]
. While the model might have been able to learn also this distribution, the training was faster by normalizing the targets to [0, 1]
and “de-normalizing” it to [0, 96]
to calculate the RMSE as well as the predictions. I don’t know, if you could also apply a similar workflow or of the target range is much larger and the normalization could “squeeze” small values into a tiny range.