Playing with BCEWithLogitsLoss

I am playing with BCEWithLogitsLoss to try to put in place a classifier. My problem does not require an ANN to be solved but I am curious to just make it work.

Please find pseudo-code below for the model :

class BinaryClassifier(nn.Module):
  
    def __init__(self, input_features, hidden_dim, output_dim):
        super(BinaryClassifier, self).__init__()

        self.fc1 = nn.Linear(input_features, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        
        x = F.relu(self.fc1(x))
        x = F.dropout(F.relu(self.fc2(x)))
        x = self.fc3(x)
        
        return torch.squeeze(x)

And here is the training loop

train_loader = get_train_data_loader(...)

model = BinaryClassifier(args.input_features, args.hidden_dim, args.output_dim)

optimizer = optim.Adam(model.parameters(), lr = 0.1)
criterion = torch.nn.BCEWithLogitsLoss()

for epoch in range(1, epochs + 1):
	model.train() 
	total_loss = 0
	
	for batch in train_loader:
		# get data
		batch_x, batch_y = batch

		batch_x = batch_x.to(device)
		batch_y = batch_y.to(device)

		optimizer.zero_grad()

		# get predictions from model
		y_pred = model(batch_x)
		
		scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 100, gamma=0.1, last_epoch=-1)

		# perform backprop
		loss = criterion(y_pred, batch_y)
		loss.backward()
		optimizer.step()
		scheduler.step()
		
		total_loss += loss.data.item()

The training data is tabular data : 3 features, around 100 samples.
The target is the class : 0 or 1

After 1000 epochs (too much in my opinion), the loss is staying quite high and the output of the model remains weird in my opinion (negative values, very large values, …)

I sense I am doing something wrong with how I am using the loss and the configuration of the last layer. BCEWithLogits expects as the name implies logits. This means raw output from the neural network right? Or am I missing something ?

Could someone point me toward the right direction ?

Thanks,
Kind regards.

The BCEWithLogitsLoss combines the Sigmoid Activation with BinaryCrossEntropyLoss. If you want to test the outputs of your model later, you have to apply the sigmoid activation to get a logit in between you target label (0-1). So a large negative value would become a logit close to 0 after applying the sigmoid function.

But if all values are negative this implies that all outputs would be close to 0, which could be wrong! And after 1000 Epochs, with this small dataset, your model will most likely overfit!

How big is your loss? Have you tried calculating the accuracy, because this could be a more easily interpretablemetric?

1 Like

I think you are spot on.
I did not apply sigmoid when doing model(x) during evaluation. So that must be the issue since indeed the activation is carried by the loss.

I understand BCEWithLogits is better for stability but then indeed one has to he careful when evaluating