Hi @ptrblck I have a small NN with 4 outputs and I want that output to be in the range -10 to 10 strictly , I thought of using the tanh at the output and then multiplying it by 10. But I want to know if there is a better way to achieve this? because I found that the most values that I got by using tanh are very close to 0 like between range of -0.2 to 0.2 so after multiplying by 10 they become -2 to 2.
One thing you could do is, remove the
nn.Tanh() and just have a
nn.Linear module and then wrap the output with a
torch.clip see here. That could work!
I honestly think, that rescaling the target to [-1, 1] and using tanh-activation is the most solid way of approaching it.
I would probably not use clipping since you probably force the gradient to 0 if I’m not mistaken.
I think with clipping the gradient would only be zero for outputs out of the [-10,10] range. Rescaling to [-1,1] is a solution but you can have issues with outliers force most outputs near the extremes (i.e +/- 10); clipping was just a way to circumvent that!
Clipping gradients is not related to your output range I guess, it just prevents the gradients from exploding.
Clipping the output, will as you said mess with the gradients outside of the range you have defined, but would work perfectly fine within them. Though in my experience this will cause trouble especially with REINFORCE, but potentially with all algorithms.
I’m not sure if I got you correctly, but what I meant was not just rescaling, but rescaling and using a sigmoid/tanh as output activation, then you have a constant gradient and do not have the issue of your values running “out of bounds”. Should have added that to the original post (in fact I will now). Or did I get you wrong?
Yes, what I meant by them is, that you clip the output in the range of [-10,10] (not the gradients of the outputs). Although, clipping the output would in effect set the gradients to 0 for any output with |y| > 10 (like you said).
What I mean is, if you re-scale the outputs you can have issues where outliers will affect the rescaling. For example, if most outputs are near say -0.5, and a few outputs are near 1. Then the rescaling will be distorted by the outliers near 1. One way to circumvent that is to clip the outliers and then rescale. Having a bounded activation function will mitigate this, but it can still be an issue. It’s just something to keep in mind when rescaling data!