Activation Function

Hello all, I am beginner in neural net, Just want to understand Activation Function.
Is this applicable as activation function? By testing the two codes, the loss of the first one is not decreasing too much than the second one.

self.sigmoid(self.relu(x))

instead to

self.sigmoid(x)

Essentially, any non-linearity we can introduce will work as an activation function. So there’s no reason why sigmoid(relu(x)) can’t work. But some activation functions perform better than others.

I’m pretty new to this stuff also, but if I understand correctly, the first implementation, i.e. sigmoid(relu(x)), would effectively constrain the output of the sigmoid between 0.5 and 1 rather than 0 and 1. The relu function returns 0 for any value of x <= 0 and the value of the sigmoid function as x–>0 = 0.5 (it’s actually undefined AT x = 0, but every sigmoid implementation I’m aware of just papers that over so you don’t have to worry about it… :)). For most models, I would think just sigmoid(x) would be more effective since sigmoid is often used as the final activation of a forward pass to return, for example, the “probability” that a sample possesses each label in an array of multiple labels, but as @ayalaa2 said, it depends on your use case.

EDIT: it’s actually NOT undefined at zero, but the rest still applies… :wink:

Thank so much for your reply and for the explanation.