# BCEWithLogitsLoss pos_weights -- how to do re-weighting on inference

I am training a multilabel classifier on some imbalanced dataset where I am using pos weights on loss.

Dataset looks like

label cat1 cat2 cat3
a 100 100 100
b 100 200 400
c 100 400 800
d 100 800 1600

And my corresponding loss

``````# One-hot encoded trinary labels hence 12-dim output for 4 labels
nn.BCEWithLogitsLoss(pos_weight=torch.tensor([1, 1, 1,
4, 2, 1,
8, 2, 1,
16, 2, 1])
``````

My question is, what is the mathematically correct way to correct for the expected data distribution for actual inference? I donâ€™t think dividing the logits directly by weight is correct.

``````output = model(data.to(device))
output = output.detach().cpu() / np.array([1, 1, 1,
4, 2, 1,
8, 2, 1,
16, 2, 1])
output = output.reshape((-1, 4, 3))
output = softmax(output)
``````

Hi Victor!

Think through with care what you are trying to do here â€“ things look a
little bit garbled.

Most likely you donâ€™t want one-hot encoding nor an eight-dimensional
output.

A *multi-label" classifier consists of a set of binary classifiers, one for each
of your four classes. Thus â€śa â€“ yes, no,â€ť â€śb â€“ yes, no,â€ť etc.

When using pytorch and `BCEWithLogitsLoss`, you would want to have
four (floating-point) labels for each sample â€“ `0.0` vs. `1.0` for each of your
four classes. So `labels` should be a `FloatTensor` of shape `[nBatch, 4]`.

Your `output` would have the same type and shape and would be the
predicted logits for each of your four classes.

To me, â€śinferenceâ€ť means you take a sample and make a prediction. For
just this operation, you wouldnâ€™t perform any re-weighting. Any re-weighting
would have occurred during training, with the goal of training your model to
make better predictions.

(You may or may not wish to use re-weighting when computing performance
metrics for your validation or test datasets. If you compute a loss function
for your validation dataset, you generally want it to be the same loss function
as the one you compute for your training set so that the two will be directly
comparable. You would then typically use the same re-weighting for your
validation-dataset loss function.)

When performing â€śpureâ€ť binary classification, you do not ever want to
use `softmax()`. `softmax()` might be used for a multi-class, single-label
problem.

One possible point of confusion: It is possible to recast a single-label binary
problem as a two-class (single-label) problem, and train it as a multi-class
(single-label) problem (that happens two have two classes) using
`CrossEntropyLoss`. (Your two-class labels would then be the one-hot
encoded version of your original binary labels.)

It looks as though this approach might be being mixed into what you are
doing. While treating a binary problem as a two-class, multi-class problem
is fully legitimate, it doesnâ€™t play nicely with a multi-label problem (even
though the multi-label problem can be looked at as being a set of binary
problems).

Best.

K. Frank

Hi K, thanks for the detailed response! One thing though â€“ I actually posted it this way with binary labels to simplify things. In my actual problem, the labels are ordinal categories, hence how I ended up with the multi-hot encoded vector. What would you suggest in this case of multilabel+multiclass?

Hi Victor!

You will have to explain what you mean by â€śmultilabel+multiclassâ€ť and
illustrate it with a concrete (if contrived) example.

Best.

K. Frank

Just updated the initial post as a multilabel trinary classification example.

For something more tangible, imagine something like a video classifier where the data labels are satisfaction surveys with 1-5 stars across multiple categories. Also imagine the distribution is skewed towards 5 stars by varying degrees, sometimes by multiple orders of magnitude.

My thinking here is, loss weighting should serve to amplify the gradients so that the model can learn something beyond just biases for the final linear layer. But I donâ€™t really want to trade precision for recall in my use case either, so I want to recalibrate for when I am doing inference.

Hi Victor!

Okay, I understand what youâ€™re trying to do now.

First off:

This isnâ€™t right. You are working with a not-binary, multi-class problem
(which in your example, has three classes). You do not want to be using
`BCEWithLogitsLoss` nor any kind of one-hot encoding (even though
your problem is â€śmulti-labelâ€ť in the sense that you have four three-class
classifiers that share some upstream processing).

Letâ€™s say you are rating videos across four categories (your four â€ślabelsâ€ť),
say, â€śplot,â€ť â€śdialog,â€ť â€śacting,â€ť and â€śmakeup,â€ť and each rating category has
three classes, â€śone star,â€ť two stars," and â€śthree stars.â€ť

You should treat each of these categories as a multi-class classification
problem and each should have its own instance of a `CrossEntropyLoss`
`loss_fn`, each instantiated with its own `weight` argument that accounts
for the class imbalance across that specific categoryâ€™s number-of-stars
classes. (`weight` is `CrossEntropyLoss`â€™s analog of `BCEWithLogitsLoss`â€™s
`pos_weight` constructor argument.)

Something like this:

Your model will have a shared â€śbackboneâ€ť that all four categories use and
four classifier â€śheads,â€ť one for each category. Letâ€™s say that the last layer
of your backbone is a `Linear` with `out_features = 100`. Then:

``````# your earlier backbone layers
self.last_backbone_layer = torch.nn.Linear (500, 100)
``````

And `forward()` would look something like this:

``````x = self.last_backbone_layer (x)
``````

Then

``````outputA, outputB, outputC, outputD = model (input)
lossA = loss_fnA (outputA, labelsA)
lossB = loss_fnB (outputB, labelsB)
lossC = loss_fnC (outputC, labelsC)
lossD = loss_fnD (outputD, labelsD)
loss_total = lossA + lossB + lossC + lossD
``````

Each of the â€ślabelsâ€ť tensors, e.g., `labelsA`, will be a `LongTensor` with
shape `[nBatch]` (with no class dimension) and consist of integer categorical
class labels whose values run over the values `0`, `1`, and `2`.

Make sure that you understand how to build and train a â€śconventionalâ€ť (that
is, not â€śmulti-labelâ€ť) multi-class classifier using `CrossEntropyLoss`, using
its `weight` constructor argument to compensate for class imbalance. This
is really just the same except that you have four such classifiers that share
a common backbone.

Best.

K. Frank

1 Like