# Own BCELoss implementation gradients deviate slightly from pytorch version

Hi, I have a question regarding a custom loss function. I am trying to design a loss function that suits my purpose (its related to adversarial attacks). I want to create a function that is based on BCELoss() so I started off with my own BCELoss() implementation:

``````class MlaLoss(nn.Module):
def __init__(self, weight=None, size_average=True):
super(MlaLoss, self).__init__()

def forward(self, x, y):

positive_loss = torch.clamp(torch.log(x), min=-100)
negative_loss = torch.clamp(torch.log(1-x), min=-100)
loss = -torch.mean(y * positive_loss + (1-y) * negative_loss)
return loss
``````

When I compare the loss to pytorch its BCELoss() I get the same values, the backpropagated gradients however deviate slightly, which degrades my attack performance significantly (if not drastically)

``````loss1 = nn.BCELoss()
loss2 = MlaLoss()

input2 = torch.clone(input1).detach()

cost1 = loss1(input1,target)
cost2 = loss2(input2,target)

cost1.backward()
cost2.backward()

print(torch.sum(cost1 - cost2))
``````

The outputs are as follows:

``````tensor(0., grad_fn=<SumBackward0>)
tensor(-5.5235e-08)
``````

How do I fix this deviation? Any suggestions or help would be very much appreciated!

Hi,
I’m not sure, but you should check the backward method of BCE in PyTorch C source.
Here are two of them:

both of them utilize `d(L)/d(x) = -w (y - x) / (x - x^2)` as backward method.
But they’re different in using epsilon.
First one uses `d(L)/d(x) = -w*(y - x) / max((x - x^2),EPS)`
and for the second one, derivative is `-w*(y - x) / ((1. - x + EPS) * (x + EPS))`
I go with the first one.
BTW, if you want to use GPU, you should check what the exact implementation is for it.
still I don’t think that this small difference causes your problem. It’s in the floating-point error range.

Thank you for your reply. Im using torch so the second one applies to my case. Then I guess that the backward operations of the operations used in my loss function do not align with the backward used in here https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Loss.cpp#L272. Does the backward of a division also work with an epsilon?

¯\_(ツ)_/¯
Probably yes!