LSTM: CrossEntropyLoss change to BCELoss

zhidali · July 26, 2017, 11:20am

Hello, I am new to pytorch, hope you can help me. Thanks!
My class is a binary class.
For the loss,

criterion = nn.CrossEntropyLoss()

loss = criterion(outputs, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()

The shapes of my output and target are (100L, 2L) and (100L,).
However, if I change to “criterion = nn.BCELoss()”, I got some error,

Traceback (most recent call last):
File “lstm-lzd-gpu.py”, line 111, in
loss = criterion(outputs, y) # cross entropy loss
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/loss.py”, line 36, in forward
return backend_fn(self.size_average, weight=self.weight)(input, target)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/thnn/loss.py”, line 22, in forward
assert input.nelement() == target.nelement()
AssertionError

Can anyone help? Thank you very much.

Jing · July 26, 2017, 1:39pm

make sure outputs.size() == y.size() is True.

zhidali · July 26, 2017, 7:59pm

Thanks for the reminder.
When I maximum the output and make it to (100L,1),
it has the same size as target, but still have error.
The ouput should contain “011010101” or just the output from the rnn?

tom · July 26, 2017, 9:20pm

For the CrossEntropyLoss you feed a score vector of batch x classes (to be converted by softmax to probabilities p of batch x classes, i.e. all non-negative, sum over each row is one) and a target class label.

For the BCELoss you supply a probability p of class 1 (with 1-p being that of class 0) and a class label 0/1 (or a probability, too, if you wanted).

So this will give the same number twice:

score = Variable(torch.randn(10,2))
target = Variable((torch.rand(10)>0.5).long())
lfn1 = torch.nn.CrossEntropyLoss()
lfn2 = torch.nn.BCELoss()
print(lfn1(score,target), lfn2(torch.nn.functional.softmax(score)[:,1],target.float()))

Best regards

Thomas

zhidali · July 27, 2017, 12:29am

Thank you very much Tom!
You explain that very clearly.
I just have one question, for the loss, do both method get the simular training accuracy in general？

magnus_w · October 10, 2017, 1:18am

quick question: my output dimension is [sequence_length, batch_dim, output_dim/n_classes]
in the case of n_classes how do i handle this?
just unsqueeze a dimension and act as if its spatial data?^^