I have been making some checks on the softmax log softmax and negative log likelihood with pytorch and I have seen there are some inconsistencies. As example suppose a logit output for cifar100 database in which one of the classes has a very high logit in comparison with the rest. For this, the softmax function outputs probability 1 for that class and 0 for the rest, and for that reason we should expect a crossentropy error of 0 (in case we have a predict the true label) and 1 when not (as the crossentropy computes log(softmax[class])
However I have realized that if I perform a log_softmax operation from the nn module (where I should get a 0 where the softmax has 1 and infinity (or a real high value as I expect we avoid computing logarithm of 0) I get an inconsistency. In this case the log softmax output a 0 for the class with high probability (as expected) but returns different numbers (very negative) for the rest. This is inconsistent for two reasons:
-first: If one class has probability 1 and the rest 0 we should expect that class to have a log_softmax of 0 and the rest have an equal log probability.
-second: If we assume that the output of the nn.CrossEntropy is rounded to 1 (but we really have a 0.999999 for that class and 0.000000001 0.0000000000009 for the rest) we could not have a 0 in the log softmax output (we should expect a value near to zero. I now put some of the outputs:
LOGIT SPACE:
[-151881.58 -53958.38 382600.28 -208273.06 -682387.7
313643.06 -174599.31 314737.03 -47761.547 210986.7
-121455.92 65831.29 253933.14 107649.18 -179261.78
-9338.262 -226704.14 -197389.72 -88550.125 -225601.8
12020.757 305235.8 31988.535 -133836.75 -124994.27
124390.14 67518.836 -231378.08 311258. 92127.34
255807.5 531698. -64797.055 -234956.02 145733.86
383663.34 157211.12 410751.75 -307850.53 119320.98
-494586.7 -71108.56 -217024.64 -343667.8 182377.83
-196660.45 378547.53 -226750.02 229103.94 -76420.19
89305.65 800864.4 284610.66 -144088.16 -356096.2
87200.52 -347407.84 -244253.73 -133480.6 219508.03
-145519.03 62401.516 -79842.984 -94347.93 -371417.62
412408.22 -26637.191 120584.336 -247938.69 -58618.914
15230.674 176264.03 -91443.67 150178.55 516807.47
-144580.42 101580.055 302416.16 279529.4 -202979.7
200805.12 -81993.945 72215.734 -25153.984 -8138.0186
339307.25 -78513.84 403537. -385725.25 319416.94
-292361.7 23827.395 -386195.25 126718.26 169128.44
777514.5 473938.72 126203.87 99491.91 -239480.5 ]
OUTPUT FROM nn.SOFTMAX
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0.]
OUTPUT OF LOG SOFTMAX
[[ -952745.94 -854822.75 -418264.1 -1009137.44 -1483252.
-487221.3 -975463.7 -486127.34 -848625.94 -589877.7
-922320.3 -735033.06 -546931.25 -693215.2 -980126.1
-810202.6 -1027568.5 -998254.1 -889414.5 -1026466.2
-788843.6 -495628.56 -768875.8 -934701.1 -925858.6
-676474.25 -733345.56 -1032242.44 -489606.38 -708737.
-545056.9 -269166.38 -865661.44 -1035820.4 -655130.5
-417201.03 -643653.25 -390112.62 -1108714.9 -681543.4
-1295451. -871972.94 -1017889. -1144532.2 -618486.56
-997524.8 -422316.84 -1027614.4 -571760.44 -877284.56
-711558.75 0. -516253.72 -944952.5 -1156960.5
-713663.9 -1148272.2 -1045118.1 -934345. -581356.4
-946383.4 -738462.9 -880707.4 -895212.3 -1172282.
-388456.16 -827501.56 -680280.06 -1048803. -859483.3
-785633.7 -624600.4 -892308.06 -650685.8 -284056.9
-945444.8 -699284.3 -498448.22 -521334.97 -1003844.06
-600059.25 -882858.3 -728648.6 -826018.4 -809002.4
-461557.12 -879378.25 -397327.38 -1186589.6 -481447.44
-1093226. -777037. -1187059.6 -674146.1 -631735.94
-23349.875 -326925.66 -674660.5 -701372.5 -1040344.9 ]]
As we can see the output of log softmax assigns a 0 and that is inconsistent because if probability is 0 we should have 0 for the rest and thus have the same value for the rest of the log softmax (and that is what nn.Softmax outputs).