Bootstrapped binary cross entropy Loss in pytorch

I am trying to implement the loss function in ICLR paper TRAINING DEEP NEURAL NETWORKS ON NOISY LABELS WITH BOOTSTRAPPING. I found that this is implemented in Tensorflow.

These are my implementations, but I do not think it is right. Can anyone help me?

class BCE_soft(nn.BCELoss):
    def __init__(self, beta=0.95):
        super(BCE_soft, self).__init__()
        self.beta = beta
    def forward(self, input, target):
        target = self.beta * target + (1 - self.beta) * input
        target = target.detach()
        return super(BCE_soft, self).forward(input, target)

class BCE_hard(nn.BCELoss):
    def __init__(self, beta=0.8):
        super(BCE_hard, self).__init__()
        self.beta = beta
    def forward(self, input, target):
        z = torch.round(input)
        z = z.detach()
        target = self.beta * target + (1 - self.beta) * z
        target = target.detach()
        return super(BCE_hard, self).forward(input, target)

The reason that I think it is wrong is that the new “target” contains information from “input”, however, we can not derive over that part since the “nn.BCE” requires its “input” to be not required grad.

1 Like

It’s also implemented for keras.
Here’s a pytorch version:

def soft_loss(predicted, target, beta=0.95):
    cross_entropy = F.nll_loss(predicted.log(), target, size_average=False)
    soft_reed = -predicted * torch.log(predicted + 1e-8)
    return beta * cross_entropy + (1 - beta) * torch.sum(soft_reed)

def hard_loss(predicted, target, beta=0.8):
    cross_entropy = F.nll_loss(predicted.log(), target, size_average=False)
    m_pred, _ = torch.max(predicted, dim=1, keepdim=True)
    hard_reed = - torch.log(m_pred + 1e-8)
    return beta * cross_entropy + (1 - beta) * torch.sum(hard_reed)

The loss is not yet averaged, so you have to divide by batch_size.

It looks good. Thanks.