BCELoss Backward When The Input is Zero

I implemented the gradient of the loss.
I compared my gradient to layers with Pytorch’s one, and because it was different, I used ‘Autograd’ to write a model with my own backward.

@staticmethod
    def backward(ctx, grad_output):

        print("grad:", grad_output.shape)
        print(grad_output)

Before this model, I did not know how to access the gradient of the loss to the output of my previous model, so I had to calculate the weights of model and compare those to that. Now, with the above function, I was able to access the gradient of loss, so I tried to compare to mine. Interesting thing is that these two values are not equal.

My loss is weighting BCEloss with these code:

criterion = nn.BCELoss(reduction='none')
loss = criterion(predicted, heatmaps)
weight_loss = loss * weights
sum_loss = torch.sum(weight_loss, dim=1)
avg_loss = torch.mean(sum_loss)

and in the backward, I just call:

avg_loss.backward()

I am computing the gradient of BCEloss with this function:

def grad_BCEloss(x, y, weight):

    temp_x = np.array([1/num if np.log(num) >=-100 else 0 for num in temp_x]])
    temp_1_x = np.array([1/num if np.log(num) >=-100 else 0 for num in 1 - temp_x])

    dloss_dx = weight*(-np.multiply(y, temp_x) + np.multiply(1-y, temp_1_x))
    
    return dloss_dx

The strange thing is that the value of my function with Pytorch’s one is not the same. There are input values:

weights: tensor([[0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033,
         0.0033, 0.0033, 0.0033, 0.0033]])


X: tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0674,
         0.0766, 0.0763, 0.0668, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0769, 0.1063, 0.1312, 0.1304, 0.1048, 0.0755, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0674, 0.1063, 0.1725, 0.2489, 0.2460, 0.1684, 0.1038, 0.0659,
         0.0000, 0.0000, 0.0000, 0.0766, 0.1312, 0.2489, 0.5474, 0.4377, 0.2405,
         0.1273, 0.0747, 0.0000, 0.0000, 0.0000, 0.0763, 0.1304, 0.2460, 0.4377,
         0.4289, 0.2378, 0.1265, 0.0744, 0.0000, 0.0000, 0.0000, 0.0668, 0.1048,
         0.1684, 0.2405, 0.2378, 0.1645, 0.1023, 0.0653, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0755, 0.1038, 0.1273, 0.1265, 0.1023, 0.0742, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0659, 0.0747, 0.0744, 0.0653, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000]], grad_fn=<ViewBackward>)


Y: tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0.]])

Which each of are tensor of size [batch_size, 121, 1]. This my output:

dloss_dy: [[0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308 0.00334308 0.00358469 0.00362035 0.00361928 0.00358226
  0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00362143 0.00374088 0.00384777 0.00384422 0.00373434 0.0036161
  0.00334308 0.00334308 0.00334308 0.00334308 0.00358469 0.00374088
  0.00403994 0.00445096 0.00443402 0.0040201  0.00373009 0.00357909
  0.00334308 0.00334308 0.00334308 0.00362035 0.00384777 0.00445096
  0.00738696 0.00594555 0.00440164 0.00383053 0.00361301 0.00334308
  0.00334308 0.00334308 0.00361928 0.00384422 0.00443402 0.00594555
  0.00585394 0.00438616 0.00382722 0.00361199 0.00334308 0.00334308
  0.00334308 0.00358226 0.00373434 0.0040201  0.00440164 0.00438616
  0.00400135 0.00372389 0.00357677 0.00334308 0.00334308 0.00334308
  0.00334308 0.0036161  0.00373009 0.00383053 0.00382722 0.00372389
  0.00361098 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308 0.00357909 0.00361301 0.00361199 0.00357677 0.00334308
  0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308 0.00334308 0.00334308 0.00334308 0.00334308 0.00334308
  0.00334308]]

Which makes sense that when x and y are 0, the gradient is the weighting, but this is Pytorch’s one:

tensor([[[-0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000,
          -0.0000, -0.0000, -0.0000],
         [-0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000,
          -0.0000, -0.0000, -0.0000],
         [-0.0000, -0.0000, -0.0000, -0.0000, 0.0036, 0.0036, 0.0036, 0.0036,
          -0.0000, -0.0000, -0.0000],
         [-0.0000, -0.0000, -0.0000, 0.0036, 0.0037, 0.0038, 0.0038, 0.0037,
          0.0036, -0.0000, -0.0000],
         [-0.0000, -0.0000, 0.0036, 0.0037, 0.0040, 0.0045, 0.0044, 0.0040,
          0.0037, 0.0036, -0.0000],
         [-0.0000, -0.0000, 0.0036, 0.0038, 0.0045, 0.0074, 0.0059, 0.0044,
          0.0038, 0.0036, -0.0000],
         [-0.0000, -0.0000, 0.0036, 0.0038, 0.0044, 0.0059, 0.0059, 0.0044,
          0.0038, 0.0036, -0.0000],
         [-0.0000, -0.0000, 0.0036, 0.0037, 0.0040, 0.0044, 0.0044, 0.0040,
          0.0037, 0.0036, -0.0000],
         [-0.0000, -0.0000, -0.0000, 0.0036, 0.0037, 0.0038, 0.0038, 0.0037,
          0.0036, -0.0000, -0.0000],
         [-0.0000, -0.0000, -0.0000, -0.0000, 0.0036, 0.0036, 0.0036, 0.0036,
          -0.0000, -0.0000, -0.0000],
         [-0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000,
          -0.0000, -0.0000, -0.0000]]])

A strange thing happen:

  1. Where the x is not 0, they have same value. (approximately)
  2. Where the x is 0, it is zero. (the first term because of clamp-log is zero, but the second term is (1-y)/(1-x) --> 1/1, so it would be multiply by weights, so it should have the value of weights.

Can anyone explain why is this happening?