Understanding pytorch binary cross entropy loss output

I am trying to understand pytorch by building a basic perceptron with a sigmoid activation and then trying to train it to classify 2 datapoints into either class 0 or 1.

Data:
input features l label
[3,2] l 1
[1,1] l 0

Code:

import torch 
import torchvision
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
# set random seed
torch.manual_seed(0)
class Perceptron(nn.Module):
    def __init__(self):
        super(Perceptron,self).__init__()
        self.linear = nn.Linear(in_features=2,out_features=1)
    
    def forward(self,x):
        x = self.linear(x)
        x = torch.sigmoid(x)
        return x

class testData(Dataset):
    def __init__(self):
        super(testData,self).__init__()
        test_data = [([3,2],1),([1,1],0)]
        self.data = test_data
    
    def __getitem__(self,index):
        dp,label = self.data[index]
        dp = torch.FloatTensor(dp)
        label = torch.tensor(label)
        return dp,label
    
    def __len__(self):
        return len(self.data)

def main():
    model = Perceptron()
    dataset = testData()
    dataloader = DataLoader(dataset,batch_size=2)
    for idx,batch in enumerate(dataloader):
        dp,label = batch
        print('data',dp)
        print('label',label)
        preds = model(dp)
        print('preds',preds)
        loss = F.binary_cross_entropy(preds.float(),label.unsqueeze(1).float())
        print('loss',loss)
    # loss = F.cross_entropy(preds,label)
    # print(loss)

if __name__ == "__main__":
    main()

Output

data tensor([[3., 2.],
        [1., 1.]])
label tensor([1, 0])
preds tensor([[0.5401],
        [0.4482]], grad_fn=<SigmoidBackward>)
loss tensor(0.6053, grad_fn=<BinaryCrossEntropyBackward>)

If I understand the binary cross entropy formula correctly, the output should be:

  • ( 1 * log(0.5401) + (1-0) *log(1-0.4482))
    And from this previous post, I am using log base e, I get my final answer as 1.21, not the 0.6053 given above. What is being calculated here?

So basically what is happening is that the binary cross entropy has two different reduction methods which you can see in the docs. The default one is mean and the other option is sum. In your calculations you are using the sum version while the pytorch loss function is using the mean. All you need to do to your equation is just divide it by the number of outputs which in your case is two and then you will get the correct loss.