Per-class and per-sample weighting

That sounds right!
I’m not sure, what S samples are in your example, but here is a small dummy code snippet showing, what I mean:

batch_size = 10
nb_classes = 2

model = nn.Linear(10, nb_classes)
weight = torch.empty(nb_classes).uniform_(0, 1)
criterion = nn.CrossEntropyLoss(weight=weight, reduction='none')

# This would be returned from your DataLoader
x = torch.randn(batch_size, 10)
target = torch.empty(batch_size, dtype=torch.long).random_(nb_classes)
sample_weight = torch.empty(batch_size).uniform_(0, 1)

output = model(x)
loss = criterion(output, target)
loss = loss * sample_weight
loss.mean().backward()

Do you mean each batch has a different size or what exactly are your samples?
Could you post a random tensor showing one sample batch?

EDIT: Probably it’s also a good idea to normalize the sample weights so that the range of the loss will approx. have the same range and won’t depend on the current sample distribution in your batch.

loss =(loss * sample_weight / sample_weight.sum()).sum()

I’m not sure in what range your weights are, so maybe it’s not necessary.

17 Likes