Why is there such a big difference in accuracy when using ReLU verus Sigmoid?

jpj · March 15, 2021, 2:14pm

I am using two hidden layers in a feed forward neural network, with SGD optimizer. Using CrossEntropyLoss for the loss.

The dataset has 4 classes and the majority class is 67% of the dataset.

When I use ReLU in the hidden layers, I get an extremely good accuracy(and precision as well).
But when I use sigmoid, it does not perform well and mostly only predicts the majority class.

Why is there such a big contrast?