Audio style transfer: combine a temporal and a spectral loss function


I am trying to implement two loss functions in my training script for convolutional models.
I use a dataset of input+target pairs that are recorded at the input and output stage of an analog device.
This training script is part of a larger project to compare different model architectures on the same task.

criterion1: torch.nn.L1Loss()
criterion2: auraloss.freq.MultiResolutionSTFTLoss()

As optimizer, I am using Adam and as scheduler ReduceLROnPlateau.

pred = model(input)

loss1 = criterion1(pred, target)
if config['criterion2'] is not None:
loss2 = criterion2(pred, target)
loss = loss1 + loss2


train_loss += loss.item()


  • Is this the right way to do it during training?
  • Do you have any suggestions on how to choose weights that would make both loss functions equally affecting the backward pass?

Thank you for your help

Any suggestions on that?

It is one way of doing it. The Problem with this approach is that both gradients should be of similar size or one of the losses will be “ignored”
Another possibility would be

loss = 0.5*loss1 + 0.5*loss2

Or even

loss = 0.7*loss1 + 0.3*loss2

This can become fiddly, especially when the losses are contradicting each other like in GANs, because you want them to reach an equilibrium

Another approach could be to use Multiple optimizers, but this becomes equally fiddly