In a simple network with a customized loss function, the outputs of the network keep growing during training. The loss function decreases in a few epochs, then stops in a constant value.
The data are normalized and I have used batch normalization and gradient clipping to try to keep the values limited. I also tried with smaller learning rate with SGD and Adam optimizers. But again the same problem.
Any idea what could be the problem?