I have built a neural network aiming to predict 5 continuous values from video samples in the range between 0 and 1. For the last activation, I used the Sigmoid Activation function and as a criterion the MSE loss. Is it good for both the choices?
Thanks in advance for the help.
MSELoss is usually the right choice for regression. I would recommend
that you always start with
MSELoss and only use something different if
you have good reason and can show that it works better.
As for the
Sigmoid, I would not use it, even though your target values
are in the range
[0.0, 1.0]. It is true that
Sigmoid maps the real line
(-inf, inf)) to
(0.0, 1.0), so it might seem natural, however,
this is probably an illusion.
If your target (ground truth) values can be close to (or equal to)
1.0, then the output of your network (before passing it through
would have to be a very large negative number (for a target close to
or a very large positive number (for a target close to
1.0), which would
be hard for your network to learn.
You can experiment with
Sigmoid if you want, but you should only
actually use it if you can show that it works better than leaving it out.
(I could think of use cases where your would want the
they would be contrived, or at least very atypical.)
Thank you very much for the answer and for the clarification. I have just another question. When you say “…it works better than leaving it out”, do you mean to consider just the logits coming out from the last FC layer? Or it is better to substitute the sigmoid with another activation function as ReLU?
Yes, I was speaking only about whether or not you should have
after your final
Linear layer. I recommend using the output of your final
Linear layer as your predictions and feeding them directly to
I was not talking about the non-linear activations between various layers.
Having said that, some of the lore suggests that
ReLU is to be preferred
Sigmoid is a perfectly reasonable non-linear activation).
Thank you Frank your suggestions were of huge help to me.
I’ll remove the sigmoid in the last layer on the final layer then.