About crossentropyloss and nllloss

as i know nn.CrossEntropyLoss equal to nn.logsoftmax + nn.NLLLoss
but the nn.CrossEntropyLoss seems like have a function with one hot
so if i want to use nn.logsoftmax + nn.NLLLoss to replace nn.CrossEntropyLoss
should i write a function def onehot?

1 Like

Just try it.

import torch
import torch.nn as nn

ce = nn.CrossEntropyLoss()
ls = nn.LogSoftmax(dim=-1)
nll = nn.NLLLoss() 

batch_size = 5
num_classes = 8
x = torch.rand(batch_size, num_classes)
y = torch.randint(num_classes, (batch_size,))

print(ce(x, y))
print(nll(ls(x), y))

PS: Here is one_hot API nn.functional.one_hot.

thanku for your help
the print(ce(x, y)) 's answer is equal to
print(nll(ls(x), y)) 's
but i get a new trouble
print(ce(ls(x),y)) is also get a same answer
it is seems like the logsoftmax is invalid?

import torch
import torch.nn.functional as F

x = torch.rand(1, 2, 3, 4)
ls = F.log_softmax(x)
lsls = F.log_softmax(F.log_softmax(x))
print((lsls - ls).abs().max())

in my forward network
the last one is x=nn.linear(xxxx,classnums)
i use the loss=crossentropyloss(x,target)

if after x=nn.linear(xxxx,classnums)
follow x=F.log_softmax(x)

loss1=nn.crossentropyloss(x,target)
loss2=nn.NLLLoss(x,target)
seems loss1=loss2?
thats strange…

Since LogSoftmax is idempotent, you’ll get the same output as shown by @Eta_C’s example.
Internally nn.CrossEntropyLoss will apply another F.log_softmax on the inputs.

However, I would recommend to stick to:

  • nn.LogSoftmax + nn.NLLLoss or
  • raw logits + nn.CrossentropyLoss

as you won’t get any benefit from it using these loss functions.

2 Likes

Although LogSoftmax is idempotent, using it twice will still produce floating point precision error. :rofl: