I am playing around with the BCELoss and BCEWithLogitsLoss, and I don’t quite understand why these losses (with size_average set to True) can ever exceed 1.

A slightly modified version of the random example from the docs never runs into this problem:

```
m = nn.Sigmoid()
loss = nn.BCELoss()
input = autograd.Variable(torch.randn(3,3,3,3), requires_grad=True)
target = autograd.Variable(torch.FloatTensor(3,3,3,3).random_(2))
output = loss(m(input), target)
print output
```

But if I change input to be

```
input = autograd.Variable(torch.zeros(3,3,3,3), requires_grad=True)
target = autograd.Variable(torch.FloatTensor(3,3,3,3).random_(2))
output = loss(input, target)
print output
```

or

```
input = autograd.Variable(torch.ones(3,3,3,3), requires_grad=True)
target = autograd.Variable(torch.FloatTensor(3,3,3,3).random_(2))
output = loss(input, target)
print output
```

I can get losses exceeding 1. This is true even if I add/subtract some eps=0.01 to/from the input before evaluating the loss (because 0’s and 1’s are never truly output from a sigmoid).

How can this be the case? It’s an average entropy, and entropy is bounded on the [0,1] interval. This happens with both BCELoss (where I am excluding the sigmoid with the 0’s and 1’s inputs) and BCEWithLogitsLoss.

I’m starting with a random initialization, so I’d expect to predict a lot of randomness (e.g. a loss near 0.5)