When do I turn prediction numbers into 1 and 0 for binary classification?

julliet · August 23, 2021, 5:55pm

I have a binary classification problem. Right now I’m using several linear layers with ReLU activation.
I’m using BCEWithLogitsLoss() for Loss, so I’m not implementing any Softmax on the layers.
Predictions of the model look something like this:
-0.2443, 6.6122, 25.0909, ..., -62.7383, 0.3066, 61.6255
So I can’t really compare it to zeroes and ones of my label. I don’t understand in which moment and how I should convert my predictions into 1s and 0s?

If it’s important right now my model looks like this:


    def __init__(self, num_cols, cat_cols, embedding_size_dict, n_classes,
                 embedding_dim_dict=None, learning_rate=0.01):
        super().__init__()

        self.cat_cols = cat_cols
        self.num_cols = num_cols
        self.embeddings, total_embedding_dim = self._create_embedding_layers(
            cat_cols, embedding_size_dict, embedding_dim_dict)
        
        in_features = len(num_cols) + total_embedding_dim
        self.layers = nn.Sequential(
            nn.Linear(in_features, 128),
            nn.ReLU(),
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, n_classes)
        )

    @staticmethod
    def _create_embedding_layers(cat_cols, embedding_size_dict, embedding_dim_dict):
        """construct the embedding layer, 1 per each categorical variable"""
        total_embedding_dim = 0
        embeddings = {}
        for col in cat_cols:
            embedding_size = embedding_size_dict[col]
            embedding_dim = embedding_dim_dict[col]
            total_embedding_dim += embedding_dim
            embeddings[col] = nn.Embedding(embedding_size, embedding_dim)

        return nn.ModuleDict(embeddings), total_embedding_dim

    def forward(self, num_tensor, cat_tensor):
        cat_outputs = []
        for i, col in enumerate(self.cat_cols):
            embedding = self.embeddings[col]
            cat_output = embedding(cat_tensor[:, i])
            cat_outputs.append(cat_output)
        
        cat_outputs = torch.cat(cat_outputs, dim=1)
        all_outputs = torch.cat((num_tensor, cat_outputs), dim=1)
        
        final_outputs = self.layers(all_outputs).squeeze(dim=1)
        return final_outputs```

zetyquickly · August 23, 2021, 11:23pm

Hello @julliet,

Basically when you decided not to go with Sigmoid or Softmax you lose ability to convert logits to probability like values. If you need probabilities try to use any continuous function maps (-inf, +inf) to [0,1] and pass this f(logits) to the call of criterion.

Examples of such mappings: f(logits) = 0.5 * (1 + tanh(logits/2)) or Sigmoid

KFrank · August 24, 2021, 5:04am

Hi Julia!

julliet:

I’m using BCEWithLogitsLoss() for Loss, so I’m not implementing any Softmax on the layers.
Predictions of the model look something like this:
-0.2443, 6.6122, 25.0909, ..., -62.7383, 0.3066, 61.6255
So I can’t really compare it to zeroes and ones of my label.
…
            nn.Linear(256, n_classes)

In terms of comparing your predictions to the “zeroes and ones” of
your label, BCEWithLogitsLoss does precisely this (without converting
your predictions into 1s and 0s).

BCEWithLogitsLoss takes predictions that are raw-score logits
(such as those produced by your final Linear layer and that run
from -inf to inf) and compares them with ground-truth labels that
are zeros and ones (or more generally, with ground-truth labels that
are probabilities between zero and one). That is, not converting your
predictions (that are logits) into zeros and ones before passing them
to BCEWithLogitsLoss is the correct thing to do. (In this situation,
BCEWithLogitsLoss will likely be your loss function used for training.)

On the other hand, if you wish to compute the accuracy of your
predictions (an evaluation metric that you would most likely not use
for training), that is, the percentage of your yes-no predictions are
correct, you do want to convert your predictions to zeros and ones,
and then simply count how many are equal to your zero-and-one
ground-truth labels.

A logit of 0.0 corresponds to a probability (of being in the “1”-class)
of 0.5, so one would typically threshold the logit against 0.0:

accuracy = ((predictions > 0.0) == labels).float().mean()

Best.

K. Frank

julliet · August 24, 2021, 7:12am

Hello K. Frank!
Thank you for your thorough answer. That’s exactly what I needed. I trained the model, but I still needed to compute accuracy what I needed zeros and ones for.

julliet · August 24, 2021, 7:15am

I did use Sigmoid and BCE Loss function on first iteration, but my Loss stayed the same value during the training. Thus, I decided to remove it and change to BCEWithLogitsLoss