Audio style transfer: combine a temporal and a spectral loss function


I am trying to implement two loss functions in my training script for convolutional models.
I use a dataset of input+target pairs that are recorded at the input and output stage of an analog device.
This training script is part of a larger project to compare different model architectures on the same task.

criterion1: torch.nn.L1Loss()
criterion2: auraloss.freq.MultiResolutionSTFTLoss()

As optimizer, I am using Adam and as scheduler ReduceLROnPlateau.

pred = model(input)

loss1 = criterion1(pred, target)
if config['criterion2'] is not None:
loss2 = criterion2(pred, target)
loss = loss1 + loss2


train_loss += loss.item()


  • Is this the right way to do it during training?
  • Do you have any suggestions on how to choose weights that would make both loss functions equally affecting the backward pass?

Thank you for your help