Why would BCELoss be >1?

I am playing around with the BCELoss and BCEWithLogitsLoss, and I don’t quite understand why these losses (with size_average set to True) can ever exceed 1.

A slightly modified version of the random example from the docs never runs into this problem:

``````m = nn.Sigmoid()
loss = nn.BCELoss()
output = loss(m(input), target)
print output
``````

But if I change input to be

``````input = autograd.Variable(torch.zeros(3,3,3,3), requires_grad=True)
output = loss(input, target)
print output
``````

or

``````input = autograd.Variable(torch.ones(3,3,3,3), requires_grad=True)
output = loss(input, target)
print output
``````

I can get losses exceeding 1. This is true even if I add/subtract some eps=0.01 to/from the input before evaluating the loss (because 0’s and 1’s are never truly output from a sigmoid).

How can this be the case? It’s an average entropy, and entropy is bounded on the [0,1] interval. This happens with both BCELoss (where I am excluding the sigmoid with the 0’s and 1’s inputs) and BCEWithLogitsLoss.

I’m starting with a random initialization, so I’d expect to predict a lot of randomness (e.g. a loss near 0.5)

Is that true? BCE for one element `x` and target `t` is defined as
`- ( t log(x) + (1-t) log (1-x)`
If we take `t -> 1` (from the bottom), `x -> 0` (from the top) we get something very large.

That’s a good point. I was conflating Shannon entropy (plogp) with the crossy-entropy loss.