so biggest value in softmax(x) will be the smallest value in log(softmax(x)).

The softmax function returns probabilities between [0, 1].
The log of these probabilities returns values between [-inf, 0], since log(0) = -inf and log(1) = 0.
That is why the order wonât change.

However, you should use the NLLLoss with a log_softmax output
or CrossEntropyLoss with logits if you prefer not to add an extra log_softmax layer into your model.

By âorderâ I meant the range of the outputs will still be in the same order.
I.e. if p0 < p1 created by softmax , then log(p0) < log(p1).

nn.NLLLoss expects the log probabilities, as the loss will be calculated as described here.
Note that nn.CrossEntropyLoss applies internally F.log_softmax and nn.NLLLoss afterwards, which is why it expects raw logits instead.

Yes I do get that. Yet I have another question if log-softmax gives values in the range [-infinity,0] that means that the values are negative,and the NLLLoss functon is -log(y), where y=log-softmax(x) , but log of some negative value isnât defined so how does is work?

Sorry but, I searched for it in the documentation,I didnât find it explicitly mentioned anywhere that negative of log_softmax goes in as input to the nllloss function.I donât know if Iâm missing something important, but please check the docs once.

The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either [âŚ].
Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

Examples:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()

Ah, sorry for the confusion, as I can see the misunderstanding now.

nn.NLLLoss expects log probabilities, so you should just apply F.log_softmax on your model output (not multiplying with -1!).

The formula posted in my previous post is, how the loss can be calculated, but you shouldnât worry about the minus sign, as it will be applied internally for you.

The posted example shows, how to apply the criterion.

The documentation mentions that log probabilities are expected and gives a code example.

What is missing and what statement do you think might have been helpful to solve your misunderstanding? The docs are by far perfect, so feedback is always welcome.

What needs to be given as input and what we get as output is fine, there is no problem in understanding that, but, if a minus is introduced internally I think there should be mention of that, as the screenshot of the documentation that you provided above it says

If it was mentioned somewhere that ânn.NLLLoss internally multiplies the log-probabilities by a minusâ, it would have cleared the confusions!

As you said so biggest value in softmax(x) will be the smallest value in log(softmax(x)).
Does that mean to calculate accuracy, argmin of predictions will be used for classification in Sentimental analysis? I am using NLLLoss to calculate the loss.

Hi, pls could you clear my doubt, I have a multi-label classification problem, where each label can either be positive or negative, so each label on its own is binary in nature. The last layer in my network is LogSoftmax , so what loss function should I use ? BCEloss ? And how do i calculate the class weights? Pls help.

When each label is either positive or negative it would be a binary classification or a (2 class) multi-class classification.
You can either return two logits (or log probabilities) from the model and use nn.CrossEntropyLoss (or nn.NLLLoss) or alternatively you can return a single logit and use nn.BCEWithLogitsLoss.