[SOLVED] Class Weighed Binary Crossentropy not working, even with equal weights

KFrank · June 4, 2019, 7:46pm

Hi Andrei!

tzeny:

Hello,

I have tried using the following custom loss class:
# class BCEWithLogitsLoss():
class CustomWBCE():
    def __init__(self, class_weights=None, **kwargs):
        self.class_weights = class_weights
...
…
Sometimes my model outputed a very small value (-136 for example) and torch.sigmoid’s result was 0., which led to a -inf in the torch.log; that’s why I added the torch.clamp.

I have tried it both with weights computed with the formula:
total = negative + positive
w0 = positive / total
w1 = negative / total
And with weights (0.5, 0.5) to test it out.
…

First, to answer the question I think you’re asking:

You should be using (as in the comment in your code)
BCEWithLogitsLoss.

BCEWithLogitsLoss supports sample weights, which you
can use for class weights.

Let’s say you have class weight w_1 for class 1, and w_0
for class 0. Let w_n be the sample weight for sample n.
Simply set w_n = w_1 if y_n = 1, and w_n = w_0 if
y_n = 0. (This assumes that the y_n are either 0 or 1, as
they should be if they are binary class labels.)

Now some comments:

Note that using class weights w_1 = w_0 = 1/2 doesn’t give
you the same result as an unweighted loss function. It gives
you 1/2 the unweighted loss function. (The loss for each sample
is multiplied by 1/2). This doesn’t matter a lot, but, for example,
with plain-vanilla stochastic-gradient-descent optimization, it
has the effect of reducing your learning rate by a factor of 1/2.

Instead of clamping the sigmoid of your output, you should be
using torch.nn.LogSigmoid. This avoids the problem of
large negative → sigmoid → 0 → log → -inf.

In general, when you are testing / debugging something like
this, instead of running your full training code with a “default”
value like weights = (0.5, 0.5), you should try calling your
function on a single sample, with your default value and
compare the single numerical result with the result of the
standard unweighted function you are trying to mimic (in
this case BCEWithLogitsLoss). Only when you are happy
that you have that working should you try running a single
batch, and when that is working, try the training.

Lastly, I think this discussion – especially the comment about
avoiding clamping – applies to your earlier thread:

and its linked thread:

Best regards.

K. Frank