I have a model in which the input data has been scaled to be between [-1,1]. The corresponding targets are all values around -2700. I have initialized my model with various schemes, but currently utilizing the xavier initialization with a tanh activation function across 2 hidden layers (5 nodes each). It seems that when I try training my model with the default target values It is unable to train properly and steadies off at predictions being just the mean of all the targets. However, when I scale the targets to be between [0,1] the model trains properly. Can someone explain as to why I’m experiencing this? Is this normal or is something off? I was under the impression that target scaling was unnecessary. Thanks!
Output of tanh activation is between -1 and 1. Your target should be similar to this range. Since target outputs are usually a linear combination of output activation, such as tanh(x)*w.
I should also note that the output node is just a linear activation function, so is this still the case?
Yes, output of multiple linear transforms is still linear.
My model has hidden layers containing tanh activation functions and an output node containing a linear activation function as to not place bounds on the values the output value can obtain. Hence the confusion as to why I need to still scale my targets.
You have to scale your target or you need a better initialization of output weight.
I am assuming that you doing regression here.
The key thing is, your network outputs should start at the average of the targets, in your case it should be around -2700 if you don’t want to scale your targets. Then the training just needs to adjust a little bit to move outputs up and down to match the targets.
If your initialized outputs is too far away from the target, then loss function is big. As a result the gradient is too big. After several updates, your weights move to far and fall into saturation area of the tanh() activation and stop learning. I guess this is what happened to your model.
Ahhh that makes perfect sense now. Thank you so much, that is exactly the case.