Which activation function for hidden layer?

I have a single hidden layer in my network, and 15 nodes in output layer (for 15 classes).
After applying nn.linear to my inputs I apply sigmoid function for the hidden layer and then use nn.linear on the hidden layer output to get output layer inputs. I use those output layer inputs in CrossEntropyLoss.
What I am confused about is, which activation function should I be using from input to hidden layer? Is sigmoid fine for this?

You should probably be using

torch.nn.ReLU()

sigmoid is not great for an inter layer activation function.

1 Like