Softmax or log_softmax, how to choose?

softmax or log_softmax, how to choose ?

Log softmax gives you the log “probability”. It is useful when you want to optimize something that involves such quantity, e.g. KL divergence. Also, summing the logs is more numerically stable than multiplying the original values. So It can also be helpful if your network parameterizes the conditional probability of part of a large model.