Custom loss for single-label, multi-class problem

I have a single-label, multi-class classification problem, i.e., a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay to not penalise the model that heavily.

For example, the ground truth for 1 sample is [0,1,1,0,1] of 5 classes, instead of a one-hot vector. This implies that, the model predicting any one of the above classes (2,3 or 5) is fine.

For every batch, the predicted output dimension is of the shape bs x n x nc, where bs is the batch size, n is the number of samples per point and nc is the number of classes. The ground truth is also of the same shape as the predicted tensor.

For every batch, I’m expecting my loss function to compare n tensors across nc classes and then average it across n.

Eg: When dimensions are 32 x 8 x 5000. There are 32 batch points in a batch (for bs=32). Each batch point has 8 vector points, and each vector point has 5000 classes. For a given batch point, I wish to compute loss across all (8) vector points, compute their average and do so for the rest of the batch points (32). Final loss would be loss over all losses from each batch point.

How can I approach designing such a loss function? Any help would be deeply appreciated

P.S.: Let me know if the question is ambiguous

Hi Saswat!

You’ve never really told us what your use case is, that is what real-world
problem you’re trying to solve or what your input data, predictions, and
ground-truth labels are supposed to mean.

I doubt that this is really exactly what you want to be doing.

But to answer your specific question:

Taking guidance from cross entropy, I would suggest loss = -log (P_all),
where P_all is the sum of the predicted probabilities for all of the classes
that “are fine.” Note, when you only have a single ground-truth class, this is
just the conventional cross entropy.

(For reasons of numerical stability, it will be best to predict log-probabilities
and use the “log-sum-exp” trick to compute log (P_all).)


K. Frank

Hi @KFrank
Sorry for the confusion

I’m working on developing an ML-based packer which aims at solving multi-dimensional knapsack problems (I.e., given x items, try to fit them in as less knapsacks as possible). I’m trying to supervise it by using a conventional knapsack solver. The solver gives me the top 50 knapsacks of varying capacities that can suitably pack the given requests.

The input is a set of n requests which need to be fit in the knapsacks. The ground truth output is the set of bin indices which the solver has given me (hence, multiple 1s).

Let me know if I’m clear on the problem definition

I didn’t quite follow you here. Did you mean, to take exp, divide by the sum of all exps and take a log of it all?

Also, how would you recommend averaging across a particular dimension?

Hi Saswat!

I do not have any intuition about how one might use a neural network to
learn to solve the knapsack problem. (My gut reaction is that it might be
hard and that approaching it as a classification problem doesn’t feel right
to me.)

Take a look at Wikipedia’s explanation of the log-sum-exp trick.

Also, pytorch offers a built-in logsumexp() function.


K. Frank