BCEWithLogitsLoss() giving negative loss

TheOraware · June 11, 2021, 4:55pm

Hi ,

I am training NN using pytorch 1.7.0 , when i use CrossEntopyLoss() loss function then i dont have any negative loss in any epochs, since this competition evaluation metrics is multi-class logarithmic loss which i believe BCEWithLogitsLoss() in pytorch serve this logarithmic loss for multi class (correct me if i am wrong).

My question is why negative loss is coming when using BCEWithLogitsLoss()? How can i prevent it, i dont wanna use CrossEntopyLoss() , Please see below code , for clarity i am showing “y” actual target and “output” prediction of model in first epoch only

def get_optimizer(model, lr):
    optim = torch_optim.Adam(model.parameters(), lr=lr, weight_decay=0.05) 
    return optim

batch_size = 2000 

def train_loop(model, epochs, lr):
    total    = 0
    sum_loss = 0
    output   = 0
    criterion = nn.BCEWithLogitsLoss()
    optim= get_optimizer(model)
    for epoch in range(epochs):
        for cat, y in train_dl:
            model.train()
            batch = y.shape[0]
            output = model(cat)
            if (epoch) ==1:
                print(f'y is {y.float}')
                print(f'y is {output[:,0]}')
            loss = criterion(output[:,0],y.float())
            optim.zero_grad()
            loss.backward()
            optim.step()
            total += batch
            sum_loss += batch*(loss.item())
        valid_ds = ClassifierDataset(X_val,y_val , features)
        batch_size = X_train.shape[0]
        valid_dl=DataLoader(valid_ds,batch_size=batch_size,shuffle=False)
        valid_dl=DeviceDataLoader(valid_dl, device)
        for cat, y in valid_dl:
            model.eval()
            output = model(cat)
            valid_loss = criterion(output[:,0],y.float())
        print(f'epoch:{epoch+1},training loss:{loss},valid loss:{valid_loss} ')

train_dl = DataLoader(train_ds, batch_size=batch_size,shuffle=True)
train_dl = DeviceDataLoader(train_dl, device)
model = multiNet(embedding_sizes)
to_device(model, device)
model.apply(init_weights) 
train_loop(model, epochs=120, lr=0.001)


epoch : 1,training loss : -127.1643,valid loss : -82.094856 
y:tensor([7., 3., 1.,  ..., 8., 7., 1.], device='cuda:0')
output:tensor([ 0.945,0.189,-1.194,...,-1.03,0.80,-1.05],device='cuda:0',grad_fn=<SelectBackward>)
epoch : 2,training loss : -298.340728,valid loss : -293.701477 
epoch : 3,training loss : -529.159423,valid loss : -535.595520 
epoch : 4,training loss : -882.299377,valid loss : -906.745788

Flock1 · June 11, 2021, 5:03pm

Try with this:
torch.nn.functional.binary_cross_entropy

ptrblck · June 11, 2021, 6:55pm

nn.BCEWithLogitsLoss expects the targets to be in the range [0, 1] as described in the docs. Since your targets contain values outside of this range, the loss could be negative.

TheOraware · June 12, 2021, 8:26am

@ptrblck thanks much appreciated , it means i have target between 0 to 8. To get these target between range 1 to 0 , i need to one hot encode them as follows?


Class_0    Class_1    Class_2    Class_3    Class_4    Class_5    Class_6    Class_7    Class_8    Class_9
-------    -------    -------    -------    -------    -------    -------    -------    -------    -------
      0          0          1          0          0          0          0          0          0          0

ptrblck · June 12, 2021, 10:35pm

Based on the output it seems you are working on a multi-class classification (i.e. each sample has one target only), so you could directly use nn.CrossEntropyLoss.
On the other hand, if some samples have zero, one, or more active targets you would be working on a multi-label classification, could use nn.BCEWithLogitsLoss and would then multi-hot encode the target.

TheOraware · June 13, 2021, 1:20pm

@ptrblck thanks much appreciated

TheOraware · June 13, 2021, 1:26pm

@ptrblck why do we use softmax function during prediction of a model which is built on using nn.CrossEntropyLoss() loss function ? As i read somewhere CrossEntropyLoss has already softmax function

ptrblck · June 13, 2021, 9:41pm

You shouldn’t use a softmax on the model outputs when you want to calculate the loss using nn.CrossEntropyLoss, since (as you’ve already said) nn.CrossEntropyLoss applies F.log_softmax and nn.NLLLoss internally, so pass the raw logits to this loss function instead.

On the other hand, you can apply a softmax on the model outputs (logits), if you want to “visualize” the probabilities or use them in any other way besides the input to nn.CrossEntropyLoss.