Loss functions differences

I was trying to tailor-make the loss function to better reflect what I was trying to achieve.
As the results were not convincing I therefore tested an MSE loss self-coded versus the nn and F versions with :

  1. criterion = nn.MSELoss() and loss1 = criterion1(outputs, targets)
  2. loss2 = F.mse_loss(outputs, targets)
  3. criterion3 = AdjMSELoss() and loss3 = criterion3(outputs, targets)
    where :

class AdjMSELoss(nn.Module):
def init(self):
super(AdjMSELoss, self).init()

def forward(self, outputs, targets):
    outputs = torch.squeeze(outputs)
    loss = (outputs - targets)**2
    return torch.mean(loss)

As long as it test this with 2 tensors outside a backprop training, it gives the exact same results… BUT when I run a model with these losses for the backprop, it diverges between the AdjMSELoss custom function and the 2 nn.MSELoss and F.mse_loss (that gave identical results).
NB. All trials share the exact same seed and are reproducible :wink:

Could someone explain why (and advise on the best approach to create an efficient custom loss function) ?
THX in advance

The code for the custom loss function is given above.
The code for the training is as follows :

def train_model1D(model, criterion, optimizer, epochs, learning, verbose, train_dl):
total_step = len(train_dl)
curr_lr = learning
if str(criterion) != ‘CrossEntropyLoss()’:
criterion2 = AdjRegrLoss()
history = dict(train=[], train_regr=[])
criterion2 = nn.CrossEntropyLoss()
accuracy = 0
history = dict(train=[], train_regr=[], acc=[])
best_model_wts = copy.deepcopy(model.state_dict())
best_loss = 1e64
for epoch in range(epochs):
train_losses = []
trregr_losses = []
y_preda = []
y_truea = []
for i, (inputs, targets) in enumerate(train_dl):
y_pred = model(inputs)
loss = criterion(y_pred, targets)
loss2 = criterion2(y_pred, targets)
if str(criterion) == ‘CrossEntropyLoss()’:
_, y_predb = torch.max(y_pred, 1)
y_predb = y_predb.cpu()
y_predb = y_predb.detach().numpy()
y_trueb = targets.cpu()
y_trueb = y_trueb.detach().numpy()
train_loss = np.mean(train_losses)
trregr_loss = np.sum(trregr_losses)
if str(criterion) == ‘CrossEntropyLoss()’:
accuracy = accuracy_score(y_preda, y_truea)
if trregr_loss < best_loss:
best_loss = trregr_loss
best_model_wts = copy.deepcopy(model.state_dict())
if ((verbose != 0) and (epoch == 0)):
print('Epoch 0 - Train Loss = ’ + str(round(train_loss, 6)) + ’ // Cumulative error training = ’ + str(round(trregr_loss, 6)))
if ((verbose != 0) and (((epoch+1) % verbose) ==0)):
print('Epoch ’ + str(epoch+1) + ’ - Train Loss = ’ + str(round(train_loss, 6)) + ’ // Cumulative error training = ’ + str(round(trregr_loss,6)))
return model, history

You’ll notice that

  1. I torch squeeze the output to get a tensor of size [batch, ] instead of [batch, 1]
  2. I compute another loss function trregr_loss that is not relevant and not used for the back propagation. It computes the effective loss of prediction if yhat is of the opposite sign of ytrue.

class AdjRegrLoss(nn.Module):
def init(self):
super(AdjRegrLoss, self).init()

def forward(self, outputs, labels):
    outputs = torch.squeeze(outputs)
    loss = torch.abs(labels)
    adj = torch.mul(outputs, labels)
    adj[adj>0] = 0
    adj[adj<0] = 1
    loss = loss * adj
    return torch.sum(loss)

Can you give the minimum codes to reproduce the error? And btw, if you want to implement the default init function of your custom class, that should be surrounded with two underscores on each side, like def __init__(self, *args)

Hello David, I modified the original post with the extended code… I hope it helps :wink:
and, of course, my init() are indeed surrounded by 2 __, even if it is not evident from the code shared ?

NB. I don’t understand why the code does not appear as code with the identification :open_mouth:

What is the shape of the output y_pred and your labels targets? I think you can’t conduct both CrossEntropyLoss and MSELoss on the same pair of data. As the document states, the shape of input and target to these loss functions should be:

  1. CrossEntropyLoss: input[N,C]; target[N] (where N is the batch size, and C is the number of classes.)
  2. MSELoss: input and target should be the identical in shape, like [N,*].

So, only when C=1, it’s possible to use the two loss functions at the same time, but that would be meaningless for a classification task.
BTW, if your want to paste codes, use three dots (```) to surround them, so the format won’t be changed.

Hello Harvey,

I don’t use the topic functions at the same time. When the model is a regression, there is one neuron in the output layer and the loss function is nn.MSE or nn.L1 (MAE) or the custom AdjMSE.

When the model classifies, there are as many neurons in the output layer as there are classes, and the loss function is nn.CrossEntropy…

The function is generic and allows both uses (regression and classification), hence the confusion.

My issue is when I do a regression, I don’t get the same backward propagation if I use nn.MSELoss or if I use the custom AdjMSE while there are assumed to be perfectly identical…

Can your confirm the targets is of shape [N,] instead of [N,1] when doing regression ?
With your implementation of AdjMSELoss, I notice that if targets is of shape [N,1], since you conduct squeeze on outputs (which will be [N,]), then the loss will be wrongly broadcast to shape [N, N].

So, if this is the problem, just squeeze the targets and the loss would be identical to nn.MSELoss.

class AdjMSELoss(nn.Module):
    def __init__(self):

    def forward(self, outputs, targets):
        outputs = torch.squeeze(outputs)
        targets = torch.squeeze(targets)
        loss = (outputs - targets)**2
        return torch.mean(loss)


Thanks for looking into this.

The outputs are of shape [N, 1] (therefore, I squeeze it) and the targets are of shape [N, ].

The 2 loss functions nn.MSELoss() and AdjMSELoss() provide the exact same numerical results when I test them on two tensors*… BUT they do not provide similar results when the backward propagation is applied with loss.backward(). And I don’t understand why and which one is the right one…

* I created randomly two numpy arrays of respective size [N, 1] and [N, ]. I converted them into Torch tensors and computed the nn.MSELoss and the AdjMSELoss. These were systematically identical. The problem only arise when I apply loss.bacward()

I can’t reproduce what you said, in my test, when outputs and targets are of same shapes, then whatever w/ or w/o loss.backward(). Outcomes of AdjMSELoss() and MSELoss are identical.
Here is my script

import torch
import torch.nn as nn

class TestModel(nn.Module):
    def __init__(self, idim: int):
        self.linear = nn.Linear(idim, 1)

    def forward(self, x: torch.Tensor):
        return self.linear(x)

class AdjMSELoss(nn.Module):
    def __init__(self):

    def forward(self, outputs, targets):
        outputs = torch.squeeze(outputs)
        loss = (outputs - targets)**2
        return torch.mean(loss)

def test():
    model = TestModel(5)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
    criterion = nn.MSELoss()
    criterion2 = AdjMSELoss()

    for i in range(1):
        inputs = torch.randn(4, 5)
        targets = torch.randn(4)
        y_pred = model(inputs)

        loss = criterion(y_pred.squeeze(), targets)
        # invoke the backward function here

        loss2 = criterion2(y_pred, targets)
        print("iter {}: {} loss: {:.3e}".format(i, criterion, loss))
        print("iter {}: {} loss: {:.3e}".format(i, criterion2, loss2))

if __name__ == "__main__":

And output is

>>> tensor(True)
>>> iter 0: MSELoss() loss: 3.582e-01
>>> iter 0: AdjMSELoss() loss: 3.582e-01