Imbalanced data set and unable to get BCEWithLogitsLoss to work

I have a binary dependent variable and I am unclear as to:

  1. get BCEWithLogitsLoss to work
  2. incorporate pos_weight (how exactly do I calculate the weights, is it of the total data set?) One class has 7000 observations and the other has 2224 in the total data set. Should it just be a tensor thats like: torch.tensor([0.3, 0.7]) (more emphasis on the pos samples that are under-represented)?

For my loss function, CrossEntropyLoss is working, but I believe BCEWithLogitsLoss should be used instead (I think?).

My model outputs logits that look like:

tensor([[ 0.5015, -0.0165],
        [ 0.5486,  0.0320],
        [ 0.4227,  0.1604],
        [ 0.2781, -0.0317],
        [ 0.2667,  0.2109],
        [ 0.1847, -0.1724],
        [ 0.2727, -0.0598],
        [ 0.3827,  0.1195],
        [ 0.2796, -0.2183],
        [ 0.6082, -0.1816],
        [ 0.4710, -0.0551],
        [ 0.1589,  0.0477],

My labels look like:

tensor([[0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [1],
        [0],
        [0],
        [1],
        [0],
        [0],

The error I get with BCEWithLogitsLoss is always; bool value of Tensor with more than one value is ambiguous.

I believe I got the weights to work appropriately:

# helper function to count target distribution inside tensor data sets
def target_count(tensor_dataset):
    count0 = 0
    count1 = 0
    total = []
    for i in tensor_dataset:
        if i[1].item() == 0:
            count0 += 1
        elif i[1].item() == 1:
            count1 += 1
    total.append(count0)
    total.append(count1)
    return torch.tensor(total)


# prepare weighted sampling for imbalanced classification
def create_sampler(target_tensor, tensor_dataset):
    class_sample_count = target_count(tensor_dataset)

    weight = 1. / class_sample_count.float()
    samples_weight = torch.tensor([weight[t[1]] for t in tensor_dataset])
    sampler = torch.utils.data.WeightedRandomSampler(weights=samples_weight,
                                                     num_samples=len(samples_weight),
                                                     replacement=True)
    return sampler


train_sampler = create_sampler(target_count(train_dataset), train_dataset)
val_sampler = create_sampler(target_count(val_dataset), val_dataset)
test_sampler = create_sampler(target_count(test_dataset), test_dataset)

Which line of code is raising this error?
Is it some code in your posted code snippets or is it raised by BCEWithLogitsLoss during the training?

Thanks for the response!

So I think I figured it out, and it seems like it was a silly issue. I am very new to PyTorch and I have been jumping into new transformer models while also playing with “old” CNNs. The transformer BERT model, which I had an extensive guide for, worked fine and outputted logits that looked like the following for a 2 label classification task.

tensor([[ 0.5015, -0.0165],
        [ 0.5486,  0.0320],
        [ 0.4227,  0.1604],
        [ 0.2781, -0.0317],
        [ 0.2667,  0.2109],
        [ 0.1847, -0.1724],
        [ 0.2727, -0.0598],
        [ 0.3827,  0.1195],
        [ 0.2796, -0.2183],
        [ 0.6082, -0.1816],
        [ 0.4710, -0.0551],
        [ 0.1589,  0.0477],

For my CNN, I mistakenly made my output to be 2 (to make the output look like the above working transformer), instead of 1; which I thought was just a difference in PyTorch from Keras, as the last layer in Keras for a 2 label classification problem looks like keras.layers.Dense(1, activation="sigmoid").

With my number of classes configuration set to 1, instead of 2 now, BCEWithLogitsLoss seems to work as intended via:

criterion = nn.BCEWithLogitsLoss()

        # `batch` contains two pytorch tensors:
        #   [0]: input ids
        #   [1]: labels
        b_input_ids = batch[0].cuda()
        b_labels = batch[1].cuda().type(torch.cuda.FloatTensor)

        # clear previously calculated gradients
        model.zero_grad()

        # forward propagation (evaluate model on training batch)
        logits = model(b_input_ids)

        # calculate cross entropy loss
        loss = criterion(logits, b_labels)

While this is not the place for huggingface/transformers discussion, I am curious as to why, for binary classification, the model outputs 2 columns of logits that look like the above instead of 1 column. I also realize that the way the transformers package has setup its API (https://huggingface.co/transformers/model_doc/bert.html) makes it so that if:

num_labels = 1; its a regression
num_labels >= 2; its classification

and thus may be why.

Thanks for your time!

You could deal with a binary classification use case in different ways:

  • You could use a single output and treat the output as the logit (or probability) representing the nagative and positive class. For this use case you would use nn.BCEWithLogitsLoss (or nn.BCELoss, if you are applying a sigmoid at the end. Note that logits + nn.BCEWithLogitsLoss give you more numerical stability)
  • Alternatively you could treat the binary classification as a 2-class multi-class classification. Foir this approach you would use two output units and either use logits + nn.CrossEntropyLoss or F.log_Softmax + nn.NLLLoss

Thanks so much for the clear and detailed explanation. This makes a lot of sense!