Problem with converting my LSTM multi-class classification model to a binary classification model

Hello! I am a PyTorch newbie and trying to learn by following tutorials. I have implemented a model for a multi-class classification task and now I’d like to use this model for a binary classification task. In the current model below, I’ve been using “CrossEntropyLoss” and “Linear” activation. My goal is to change this to “BCELoss” and “Sigmoid” activation, however, this causes me a size mismatch error (batch_size = 64, n_hidden=256, Each target element is either 0 or 1 and each input element is a float value):

Using a target size (torch.Size([64])) that is different to the input size (torch.Size([64, 256])) is deprecated. Please ensure they have the same size.

class ModuleLSTM(nn.Module):
  def __init__(self, n_features, n_hidden=256, n_layers=3):        
    self.lstm = nn.LSTM( ...params... )
    self.classifier = nn.Sigmoid()     # This was: self.classifier = nn.Linear(n_hidden, 2)                    
  def forward(self, x):
    _, (hidden, _) = self.lstm(x)
    hidden = torch.squeeze(hidden, 1)
    output = hidden[-1]
    return self.classifier(output)

class Classifier(pl.LightningModule):
  def __init__(self, n_features, n_classes=None):
    self.model = ModuleLSTM(n_features)  # This was: self.model = ModuleLSTM(n_features, n_classes)
    self.criterion = nn.BCELoss()     # This was: self.criterion = nn.CrossEntropyLoss()

  def forward(self, x, labels=None):
    output = self.model(x)
    loss = 0
    if labels is not None:    loss = self.criterion(output, labels)
    return loss, output
  def training_step(self, batch, batch_idx):               
    sequences, labels = batch["sequence"], batch["label"]
    loss, outputs = self(sequences, labels)
    predictions = torch.argmax(outputs, dim=1)
    stepAccuracy = accuracy(predictions, labels)
    return {"loss": loss, "accuracy": stepAccuracy}

 # validation_step, test_step

If I use “CrossEntropyLoss” and “Linear” activation instead of “BCELoss” and “Sigmoid”, systems works without any problem. In order to make it work, do I need to modify the forward method of ModuleLSTM from scratch to return an output of size “torch.Size([64])”, or do I need to just reshape the return value of self.model(x) from “torch.Size([64, 256])” to “torch.Size([64])”? I am afraid of information loss if I follow the second case.


I’m not completely sure how the output shape of your model ([batch_size=64, nb_classes=256]) fits a binary classification use case.
If you want to transform your current use case from a multi-class classification with 256 classes to a binary classification, the model output would be [64, 1] and the target would have the same shape.

However, since the model output already doesn’t match a binary classification, are you maybe working on a multi-label classification, which would allow each sample to belong to zero, one, or multiple classes? In that case, the target should also have the same shape as the model output, i.e. [64, 256].

Hey @ptrblck , thank you for your answer! Actually, the number of classes is not 256, but it is 2: either 0 or 1. In this current version (multi-class classification), I call the “Classifier” in this way:

model = Classifier(n_features = numFeatures, n_classes = 2)

and as you can see on the init method of class “moduleLSTM”, last layer is

self.classifier = nn.Linear(n_hidden, n_classes) # where n_classes = 2

and in the init method of “Classifier”:

self.model = ModuleLSTM(n_features, n_classes)
self.criterion = nn.CrossEntropyLoss()

As you see, this is a configuration for multi-class classification. Since I have only 2 classes (0/1), I’d like to modify this model for binary classification. Therefore, I am trying to migrate from CrossEntropyLoss to BCELoss and from Linear Activation to Sigmoid. The problem is that according to the PyTorch LSTM Documentation, the shape of return value for “hidden”:

_, (hidden, _) = self.lstm(x)

is: (num_layers * num_directions, batch, hidden_size). Since I use n_hidden=256, my output becomes [64, 256] and I get a size mismatch error. This is why I wonder that maybe I have to change also the forward method of my “ModuleLSTM” from scratch to adapt my model to binary classification.

It seems you are setting n_classes to 2 while creating the module, but are never using it inside the module.
For a binary classification use case you could either use an output layer returning logits in the shape [batch_size, 2], treat it as a 2-class multi-class classification, and use nn.CrossEntropyLoss, or alternatively return logits with the shape [batch_size, 1], treat it as a binary classification, and use nn.BCEWithLogitsLoss.

The logits could probably be created by the nn.LSTM module or you could use an additional nn.Linear layer, which would take the lstm outputs and return the logits in the desired shape.

Hello @ptrblck , thank you for the answer!

It seems you are setting n_classes to 2 while creating the module, but are never using it inside the module.

I am passing the ‘n_classes’ to nn.Linear.

Actually, I decided to leave my model as a multi-class classifier, because besides using the same dataset with classes only 0-1, I can diversify my dataset like classes 0-1-2… in the following days. As far as I understood, even if I build the model as a multi-class classifier, I can feed it with a dataset containing only 0-1 labels.