I was trying to tailor-make the loss function to better reflect what I was trying to achieve.

As the results were not convincing I therefore tested an MSE loss self-coded versus the nn and F versions with :

- criterion = nn.MSELoss() and loss1 = criterion1(outputs, targets)
- loss2 = F.mse_loss(outputs, targets)
- criterion3 = AdjMSELoss() and loss3 = criterion3(outputs, targets)

where :

class AdjMSELoss(nn.Module):

definit(self):

super(AdjMSELoss, self).init()`def forward(self, outputs, targets): outputs = torch.squeeze(outputs) loss = (outputs - targets)**2 return torch.mean(loss)`

As long as it test this with 2 tensors outside a backprop training, it gives the exact same results… BUT when I run a model with these losses for the backprop, it diverges between the AdjMSELoss custom function and the 2 nn.MSELoss and F.mse_loss (that gave identical results).

NB. All trials share the exact same seed and are reproducible

Could someone explain why (and advise on the best approach to create an efficient custom loss function) ?

THX in advance

The code for the custom loss function is given above.

The code for the training is as follows :

def train_model1D(model, criterion, optimizer, epochs, learning, verbose, train_dl):

model.train()

torch.set_grad_enabled(True)

total_step = len(train_dl)

curr_lr = learning

if str(criterion) != ‘CrossEntropyLoss()’:

criterion2 = AdjRegrLoss()

history = dict(train=[], train_regr=[])

else:

criterion2 = nn.CrossEntropyLoss()

accuracy = 0

history = dict(train=[], train_regr=[], acc=[])

best_model_wts = copy.deepcopy(model.state_dict())

best_loss = 1e64

for epoch in range(epochs):

train_losses = []

trregr_losses = []

y_preda = []

y_truea = []

for i, (inputs, targets) in enumerate(train_dl):

y_pred = model(inputs)

loss = criterion(y_pred, targets)

optimizer.zero_grad()

loss.backward()

optimizer.step()

train_losses.append(loss.item())

loss2 = criterion2(y_pred, targets)

trregr_losses.append(loss2.item())

if str(criterion) == ‘CrossEntropyLoss()’:

_, y_predb = torch.max(y_pred, 1)

y_predb = y_predb.cpu()

y_predb = y_predb.detach().numpy()

y_trueb = targets.cpu()

y_trueb = y_trueb.detach().numpy()

y_preda.extend(y_predb)

y_truea.extend(y_trueb)

train_loss = np.mean(train_losses)

trregr_loss = np.sum(trregr_losses)

history[‘train’].append(train_loss)

history[‘train_regr’].append(trregr_loss)

if str(criterion) == ‘CrossEntropyLoss()’:

accuracy = accuracy_score(y_preda, y_truea)

history[‘acc’].append(accuracy)

if trregr_loss < best_loss:

best_loss = trregr_loss

best_model_wts = copy.deepcopy(model.state_dict())

if ((verbose != 0) and (epoch == 0)):

print('Epoch 0 - Train Loss = ’ + str(round(train_loss, 6)) + ’ // Cumulative error training = ’ + str(round(trregr_loss, 6)))

if ((verbose != 0) and (((epoch+1) % verbose) ==0)):

print('Epoch ’ + str(epoch+1) + ’ - Train Loss = ’ + str(round(train_loss, 6)) + ’ // Cumulative error training = ’ + str(round(trregr_loss,6)))

torch.set_grad_enabled(False)

model.eval()

return model, history

You’ll notice that

- I torch squeeze the output to get a tensor of size [batch, ] instead of [batch, 1]
- I compute another loss function trregr_loss that is not relevant and not used for the back propagation. It computes the effective loss of prediction if yhat is of the opposite sign of ytrue.

class AdjRegrLoss(nn.Module):

definit(self):

super(AdjRegrLoss, self).init()`def forward(self, outputs, labels): outputs = torch.squeeze(outputs) loss = torch.abs(labels) adj = torch.mul(outputs, labels) adj[adj>0] = 0 adj[adj<0] = 1 loss = loss * adj return torch.sum(loss)`