Hi, from the literature it is known that normally in NN it is used a Softmax for classification and a Sigmoid for regression.

My question is always required an activation function at the end. I am trying a U-net to acquire pixel wise depth in a scene from rgb. and I get better results if I don’t use any sigmoid at the end. I was wondering that maybe there is a logical explanation for that.

There are two main uses of the activations functions

In the inner bits of the network, you need to have one to go beyond linear (i.e. to get the ability to approximate arbitrary functions),

at the final layer, the main purpose is slightly different, it is to “force” the domain: softmax ensures you get probability vectors, sigmoids ensure that you get values in [0…1] (probabilities or intensities or whatever),

so while it may be useful to use an activation to move things in the right domain (and sometimes required e.g. if your loss function, like cross entropy, only works on probabilities), there isn’t any such requirement to do so in regression tasks (and in fact, the most classical linear regression doesn’t do anything like that, right?).