Confusion with input shapes to BCEWithLogitsLoss()

I accidentally input a tensor of shape (batch_size, 1, 2, 2) into the BCEWithLogitsLoss() function and gave the expected outputs with the same shape as well. This gave me only one value as loss for each batch whereas I would expect 4 different values as loss for each batch (since it is 1, 2, 2). I wanted to know what torch does internally that I get such an output.

Below is the code I used to test:

import torch
import torch.nn as nn
import torch.nn.functional as F
a = torch.ones(10, 1, 2, 2)
b = torch.randn(10, 1, 2, 2)
t_loss = torch.binary_cross_entropy_with_logits_loss(b, a)

This t_loss gave me an output loss of shape (10, 1, 2, 2).

lfunc = nn.BCEWithlogitsLoss()
n_loss = lfunc(b, a)
f_loss = F.binary_cross_entropy_with_logits_loss(b, a)

Here, n_loss and f_loss gave me the same output consisting of only one number.

I was wondering how PyTorch is internally handling this kind of situation.

If you see the documentation for the implementation it has 2 modes in reduction ‘mean’ and ‘sum’ which just specifies the mean and sums of all the (10, 1, 2, 2) values default is mean .

a = torch.ones(10, 1, 2, 2)
b = torch.randn(10, 1, 2, 2)

torch.binary_cross_entropy_with_logits(a, b)
# shape (10, 1, 2, 2)

to demonstrate consider this demo code

F.binary_cross_entropy_with_logits(a, b, reduction='mean')
#> tensor(1.2240)
F.binary_cross_entropy_with_logits(a, b, reduction='sum')
#> tensor(48.9613)

which is equivalent to

torch.mean(torch.binary_cross_entropy_with_logits(a, b))
#> tensor(1.2240)
torch.sum(torch.binary_cross_entropy_with_logits(a, b))
#> tensor(48.9613)

and why we use one single return value is because of gradient calculation and we can backpropagate using loss.backward() only on a single value without explicitly writing gradient tensor for a tensor loss.

Thank you for your response!