Using NLLloss weighted loss how to define a good loss

Xiaoyu_Song · March 8, 2019, 6:45am

Hi, I have a very imbalanced data where the weight is in the range [0.0000000012,1], in order to use the weighted NLLloss, if I apply directly these extreme small value weight, I get the very small loss, so if I want to get the same indicator as the case without weight before. What is the strategy of assigning weights?
In other words, 1. what should my weight look like (for example weight all > 0 etc.)
2. If the weighed NLLloss is used, for the loss, if I need to multiple a factor to get the same representation(with out weight)?

An example:

import torch
import torch.nn as nn
log_prob = torch.tensor([[-0.0141, -4.2669]])
target = torch.tensor([0])
criterion = nn.NLLLoss()

criterion(log_prob, target)

out: tensor(0.0141)

log_prob = torch.tensor([[-0.0141, -4.2669]])
target = torch.tensor([0])
weight = torch.tensor([0.00009, 0.99991])
criterion = nn.NLLLoss(weight=weight,  reduction='sum')

criterion(log_prob, target)

out:tensor(1.2690e-06)

So in the second case, how to know that my network is well trained since the loss value is already very small.

Thank you

Xiaoyu_Song · March 8, 2019, 7:18am

And in this case, how to define the learning rate, assume before the learning rate is 0.001, when using the weighted NLLloss, should I use another learning rate.

ptrblck · March 8, 2019, 12:18pm

The loss will be rescaled with the weights as described in the docs if you keep reduction='mean'. You can see in the formula, that each sample loss will be divided by the corresponding weight.

This also means, you shouldn’t have to change the learning rate or other parameters.

Is there a reason you are using reduction='sum'?
If you need to apply it, you would have to rescale the loss manually.
I don’t think changing other hyperparameters like the learning rate might be a good idea, since your “summed” loss will depend on the current class distribution in the batch and thus might mess up your training.

Xiaoyu_Song · March 11, 2019, 1:43am

thank you ptrblck, I noticed this post in Github and I thought I should use

reduction = 'sum'

If I understand right, you mean that if I leave the reduction = 'mean', it will be fine and I won’t worry about the loss value and the learning rate.
Thank you, it helps me a lot.

ptrblck · March 11, 2019, 1:50am

Thanks for the link.
I think there might be a misunderstanding in the issue.
The weight will be canceled out, if you only provide a single sample. However, if you provide a batch, the weight will be applied and the loss will be normalized using the corresponding weights as described in the docs:

log_prob = torch.tensor([[-0.0141, -4.2669],
                         [-0.0141, -4.2669]])
target = torch.tensor([0, 1])
weight = torch.tensor([2.0, 3.0])
criterion = nn.NLLLoss()
criterion_weighted = nn.NLLLoss(weight=weight)

print(criterion(log_prob, target))
> tensor(2.1405)
print(criterion_weighted(log_prob, target))
> tensor(2.5658)

Yes, your understanding is correct.