LSTM: CrossEntropyLoss change to BCELoss

Hello, I am new to pytorch, hope you can help me. Thanks!
My class is a binary class.
For the loss,

criterion = nn.CrossEntropyLoss()

loss = criterion(outputs, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()

The shapes of my output and target are (100L, 2L) and (100L,).
However, if I change to “criterion = nn.BCELoss()”, I got some error,

Traceback (most recent call last):
File “lstm-lzd-gpu.py”, line 111, in
loss = criterion(outputs, y) # cross entropy loss
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/loss.py”, line 36, in forward
return backend_fn(self.size_average, weight=self.weight)(input, target)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/thnn/loss.py”, line 22, in forward
assert input.nelement() == target.nelement()
AssertionError

Can anyone help? Thank you very much.

make sure outputs.size() == y.size() is True.

Thanks for the reminder. :slight_smile:
When I maximum the output and make it to (100L,1),
it has the same size as target, but still have error.
The ouput should contain “011010101” or just the output from the rnn?

For the CrossEntropyLoss you feed a score vector of batch x classes (to be converted by softmax to probabilities p of batch x classes, i.e. all non-negative, sum over each row is one) and a target class label.

For the BCELoss you supply a probability p of class 1 (with 1-p being that of class 0) and a class label 0/1 (or a probability, too, if you wanted).

So this will give the same number twice:

score = Variable(torch.randn(10,2))
target = Variable((torch.rand(10)>0.5).long())
lfn1 = torch.nn.CrossEntropyLoss()
lfn2 = torch.nn.BCELoss()
print(lfn1(score,target), lfn2(torch.nn.functional.softmax(score)[:,1],target.float()))

Best regards

Thomas

6 Likes

Thank you very much Tom!
You explain that very clearly.:slight_smile:
I just have one question, for the loss, do both method get the simular training accuracy in general?

quick question: my output dimension is [sequence_length, batch_dim, output_dim/n_classes]
in the case of n_classes how do i handle this?
just unsqueeze a dimension and act as if its spatial data?^^