Scaling target variables

mshuaibi · March 27, 2019, 8:34pm

I have a model in which the input data has been scaled to be between [-1,1]. The corresponding targets are all values around -2700. I have initialized my model with various schemes, but currently utilizing the xavier initialization with a tanh activation function across 2 hidden layers (5 nodes each). It seems that when I try training my model with the default target values It is unable to train properly and steadies off at predictions being just the mean of all the targets. However, when I scale the targets to be between [0,1] the model trains properly. Can someone explain as to why I’m experiencing this? Is this normal or is something off? I was under the impression that target scaling was unnecessary. Thanks!

sonnguyen · March 27, 2019, 8:50pm

Output of tanh activation is between -1 and 1. Your target should be similar to this range. Since target outputs are usually a linear combination of output activation, such as tanh(x)*w.

mshuaibi · March 27, 2019, 8:58pm

I should also note that the output node is just a linear activation function, so is this still the case?

sonnguyen · March 27, 2019, 9:02pm

Yes, output of multiple linear transforms is still linear.

mshuaibi · March 28, 2019, 2:40pm

My model has hidden layers containing tanh activation functions and an output node containing a linear activation function as to not place bounds on the values the output value can obtain. Hence the confusion as to why I need to still scale my targets.

sonnguyen · March 28, 2019, 3:43pm

You have to scale your target or you need a better initialization of output weight.
I am assuming that you doing regression here.
The key thing is, your network outputs should start at the average of the targets, in your case it should be around -2700 if you don’t want to scale your targets. Then the training just needs to adjust a little bit to move outputs up and down to match the targets.
If your initialized outputs is too far away from the target, then loss function is big. As a result the gradient is too big. After several updates, your weights move to far and fall into saturation area of the tanh() activation and stop learning. I guess this is what happened to your model.

mshuaibi · March 28, 2019, 3:46pm

Ahhh that makes perfect sense now. Thank you so much, that is exactly the case.