Validation and training loss almost constant

Normalizing the output could work e.g. if your target has specific bounds.
E.g. in the past I’ve worked on a keypoint detection use case where the target coordinates were in the range [0, 96]. While the model might have been able to learn also this distribution, the training was faster by normalizing the targets to [0, 1] and “de-normalizing” it to [0, 96] to calculate the RMSE as well as the predictions. I don’t know, if you could also apply a similar workflow or of the target range is much larger and the normalization could “squeeze” small values into a tiny range.