Hello! I am a PyTorch newbie and trying to learn by following tutorials. I have implemented a model for a multi-class classification task and now I’d like to use this model for a binary classification task. In the current model below, I’ve been using “CrossEntropyLoss” and “Linear” activation. My goal is to change this to “BCELoss” and “Sigmoid” activation, however, this causes me a size mismatch error (batch_size = 64, n_hidden=256, Each target element is either 0 or 1 and each input element is a float value):
Using a target size (torch.Size([64])) that is different to the input size (torch.Size([64, 256])) is deprecated. Please ensure they have the same size.
class ModuleLSTM(nn.Module):
def __init__(self, n_features, n_hidden=256, n_layers=3):
super().__init__()
self.lstm = nn.LSTM( ...params... )
self.classifier = nn.Sigmoid() # This was: self.classifier = nn.Linear(n_hidden, 2)
def forward(self, x):
self.lstm.flatten_parameters()
_, (hidden, _) = self.lstm(x)
hidden = torch.squeeze(hidden, 1)
output = hidden[-1]
return self.classifier(output)
class Classifier(pl.LightningModule):
def __init__(self, n_features, n_classes=None):
super().__init__()
self.model = ModuleLSTM(n_features) # This was: self.model = ModuleLSTM(n_features, n_classes)
self.criterion = nn.BCELoss() # This was: self.criterion = nn.CrossEntropyLoss()
def forward(self, x, labels=None):
output = self.model(x)
loss = 0
if labels is not None: loss = self.criterion(output, labels)
return loss, output
def training_step(self, batch, batch_idx):
sequences, labels = batch["sequence"], batch["label"]
loss, outputs = self(sequences, labels)
predictions = torch.argmax(outputs, dim=1)
stepAccuracy = accuracy(predictions, labels)
return {"loss": loss, "accuracy": stepAccuracy}
# validation_step, test_step
If I use “CrossEntropyLoss” and “Linear” activation instead of “BCELoss” and “Sigmoid”, systems works without any problem. In order to make it work, do I need to modify the forward method of ModuleLSTM from scratch to return an output of size “torch.Size([64])”, or do I need to just reshape the return value of self.model(x) from “torch.Size([64, 256])” to “torch.Size([64])”? I am afraid of information loss if I follow the second case.
Thanks!