The loss calculation is not wrong according to the docs and as seen in this example, which uses the “logits” (which looks like probabilities) for both classes in this multi-class example:
I am used to the fact that entropy is calculated as the sum of all -p*log(p) and did not take into account that for binary entropy one probability is enough.