Equivalent of TensorFlow's Sigmoid Cross Entropy With Logits in Pytorch

I am trying to find the equivalent of sigmoid_cross_entropy_with_logits loss in Pytorch but the closest thing I can find is the MultiLabelSoftMarginLoss.

Can someone direct me to the equivalent loss? If it doesn’t exist, that information would be useful as well so I can submit a suitable PR.


I think it’s class torch.nn.CrossEntropyLoss.

That doesn’t seem to be the case. As per the docs CrossEntropyLoss only takes a single class index. Please correct me if I am wrong.

1 Like

You’re looking for KLDivLoss, which takes two log-probability inputs. If you have logits, you will need to apply F.log_softmax first.


The objective function formulation is different from the Cross Entropy formulation given in TensorFlow. I don’t think this is the correct loss.


Maybe the answer to this stackoverflow question is helpful,

In mathematical terms, what exactly do you want to do? That might be easier for people to help you with, rather than trying to port over a TF function?

If you want to do multi-label classification, so do I, but I haven’t figured out yet how to do it in PyTorch? So I’m also interested in your question :smile:



1 Like

From the implementation details, it would seem that the MultiLabelSoftMarginLoss is indeed the equivalent of the sigmoid_cross_entropy_with_logits loss. Closing this!


Hi @varunagrawal,

did you get MultiLabelSoftMarginLoss to work on a multi-label classification test problem?

I tried applying it to a multi-label MNIST test, (each image is label by it’s original class, and the class-1), but it didn’t work?

I believe you are talking about BCELoss
or http://pytorch.org/docs/nn.html#binary-cross-entropy
but you’d have to apply sigmoid activation yourself before that


@AjayTalati I managed to use BCELoss, binary_crossentropy and MultiLabelSoftMarginLoss on a MultiLabel problem

Here is the basic code

def train(epoch):
    for batch_idx, (data, target) in enumerate(train_loader):
        # data, target = data.cuda(async=True), target.cuda(async=True) # On GPU
        data, target = Variable(data), Variable(target)
        output = model(data)
        loss = F.binary_cross_entropy(output, target)
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))

And the source is here.

For BCELoss you can use criterion = BCELoss() and then loss = criterion(output, target) but as @Misha_E said, the NN must return a sigmoid activation.


Hi Mamy, @mratsim

thanks a lot for posting your code :smile:

All the best,


Just for anyone else who finds this from Google (as I did), BCEWithLogitsLoss now does the equivalent of sigmoid_cross_entropy_with_logits from TensorFlow. It is a numerically stable sigmoid followed by a cross entropy combination.


Worth noting that KLDivLoss still needs to run with reduction='batchmean' – to get the “soft cross_entropy” behavior that people are asking. Surprised this isn’t a more clearly documented…

1 Like