Simple CNN for object counting only works with batch size 1

ptrblck · May 22, 2020, 10:10pm

Thanks for the executable code, that was really helpful.

You are accidentally broadcasting the loss, since you have a mismatch in the output and target tensors.
While your output has the shape [batch_size, 1], the target has [batch_size].

This yields to a broadcasting as seen here:

# your code with the broadcasting
output = torch.randn(4, 1)
target = torch.randn(4)

criterion = nn.L1Loss(reduction="none")

loss = criterion(output, target)
print(loss) # you only want the diagonal
> tensor([[1.0231, 2.3743, 2.4857, 2.3248],
        [1.5896, 0.2385, 0.1270, 0.2879],
        [1.7572, 0.4061, 0.2946, 0.4555],
        [1.2650, 0.0862, 0.1976, 0.0368]])

# fixed
target = target.unsqueeze(1)
loss = criterion(output, target)
print(loss) 
> tensor([[1.0231],
        [0.2385],
        [0.2946],
        [0.0368]])

You should also get a warning such as:

UserWarning: Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

Use this line of code to calculate the loss and it should work:

loss =  criterion(outputs, labels.float().unsqueeze(1))

We were all in the same situation, so please don’t be disappointed.