I am trying to design an ensemble loss function for an autoencoder to minimize the loss of the target class as well as maximize the loss of the non-target class.
My current loss function works fine with just minimizing loss target_loss but starts giving nan values when I add the non_target_loss term to the overall loss.
target_loss = MSE(input, output) + sparsity_term
non_target_loss = MSE(input, output_hat) + sparsity_term_hat
# I subtract because the second term has to be maximized.
loss = target_loss - (Coeff)*non_target_loss.
I am unsure about my design, any suggestions for a better design?
This is to be expected. mse_loss() is bounded below (by zero), so
minimizing it won’t cause anything to diverge – the most you can do
is drive it to zero.
However, because mse_loss() is unbounded above, your non-target
term, basically -mse_loss, is unbounded below, so minimizing the
non-target term can and will diverge, hence the nans.
Assuming that your basic approach makes sense, you could consider
running your non_target_loss through something like a sigmoid()
before subtracting it from your total loss:
loss = target_loss - (Coeff) * torch.nn.functional.sigmoid (non_target_loss)
This total loss will still penalize your model for matching the non-target
class. non_target_loss can still become arbitrarily large, but, because
it is “softly” clipped by the sigmoid(), it won’t ever cause your total loss to become arbitrarily negative, and your training (almost certainly)