Thanks for the link.
I think there might be a misunderstanding in the issue.
The weight will be canceled out, if you only provide a single sample. However, if you provide a batch, the weight will be applied and the loss will be normalized using the corresponding weights as described in the docs:
log_prob = torch.tensor([[-0.0141, -4.2669],
[-0.0141, -4.2669]])
target = torch.tensor([0, 1])
weight = torch.tensor([2.0, 3.0])
criterion = nn.NLLLoss()
criterion_weighted = nn.NLLLoss(weight=weight)
print(criterion(log_prob, target))
> tensor(2.1405)
print(criterion_weighted(log_prob, target))
> tensor(2.5658)
Yes, your understanding is correct. 