Changin reduction parameter in the BCEWithLogitLoss for multi-label problem, change the results a lot

Hi,
I trained a deep neural network for multi-label classification. When I used the ‘mean’ reduction in the BCEWithLogitsLoss, the results are so low and when I changed it to ‘sum’, the results get so much better. What does this say about my model? Is there any problem with the gradient value in the first model?
Thanks

Have you played around with some hyperparameters, e.g. have you tried to increase the learning rate, as the gradients should be larger using the 'sum' reduction:

# mean
torch.manual_seed(2809)
model = models.resnet18()
target = torch.randint(0, 2, (1, 1000)).float()
criterion = nn.BCEWithLogitsLoss(reduction='mean')
output = model(torch.randn(1, 3, 224, 224))
loss = criterion(output, target)
loss.backward()
print(model.conv1.weight.grad.sum())
> tensor(-0.1137)

# sum
torch.manual_seed(2809)
model = models.resnet18()
target = torch.randint(0, 2, (1, 1000)).float()
criterion = nn.BCEWithLogitsLoss(reduction='sum')
output = model(torch.randn(1, 3, 224, 224))
loss = criterion(output, target)
loss.backward()
print(model.conv1.weight.grad.sum())
> tensor(-113.7127)

Yes, I changed learning rate from 0.01 to 0.5 and the results with ‘mean’ don’t get better at all. I have absolutely no idea why this happens.

What are the loss values you get for mean and sum reductions?
@ptrblck Does it make sense to use a higher learning rate incase of mean reduction since it could be the case where loss & gradients are small? or use a constant multiplier for loss to magnify it?

I would use the 'mean' reduction as the default value and try out different learning rates.
Otherwise your training (and learning rate) will depend on the batch size.

However, if a summed loss gives better convergence than the mean, a possible explanation would be the increased scale of the gradients.

1 Like

Thanks a lot for your help.

When I use mean the training loss starts with ‘1.3166’ and when I use sum it starts with ‘264594.4062’(the dataset has 39 labels) (and it doesn’t make sense I think)

It could be a case of lower learning rate incase of mean reduction. Try a higher learning rate to see if it converges.

I tried increasing and decreasing learning and nothing has changed.

Does your loss remain constant(or undergo very minimal change) when mean reduction is used?

When I set lr to 0.001, the training loss decreases very slowly (0.0004 approaximately in every iteration) but the test loss remains. When I set it to 0.1, the training loss decreases less slowly (about 0.002 in each iteration), and the test loss decrease and increase

Try to scale up your loss by some constant. Also try using even higher learning rate in such a way your train loss doesn’t oscillate much across iterations.