Why softmax training is more stable (than sigmoid)

I’m wondering about which activation function will be easier to train with (get better accuracy / smallest loss) - with SoftMax or sigmoid (for multiclass classification problem)

According to: https://www.quora.com/What-are-the-benefits-of-using-a-softmax-function-instead-of-a-sigmoid-function-in-training-deep-neural-networks

Training model for multiclass with SoftMax - the training is more stable vs training with sigmoid

  • Why is it true ?
  • Is it easier to train model (and get better results) with SoftMax (instead of sigmoid) ?

Hi Amit!

My reply to your other post probably answers this question. See:

Best.

K. Frank

1 Like