I am using two hidden layers in a feed forward neural network, with SGD optimizer. Using CrossEntropyLoss for the loss.
The dataset has 4 classes and the majority class is 67% of the dataset.
When I use ReLU in the hidden layers, I get an extremely good accuracy(and precision as well).
But when I use sigmoid, it does not perform well and mostly only predicts the majority class.
Why is there such a big contrast?