In a simple network with a customized loss function, the outputs of the network keep growing during training. The loss function decreases in a few epochs, then stops in a constant value.
The data are normalized and I have used batch normalization and gradient clipping to try to keep the values limited. I also tried with smaller learning rate with SGD and Adam optimizers. But again the same problem.
What kind of loss function are you using?
It seems that the loss value might not reflect the “difference” between the model output and the target, since the loss is constant, while the output is growing?
Based on this, I guess the loss function implementation might be buggy or “unexpected” behavior.