Self-adaptive loss weights

Joeyoung · June 16, 2024, 6:11pm

I want to implement self-adaptive loss weights (let’s say lambda) instead of constant lambda for all loss data.
i.e.
Instead of Total_loss = lambda1 * sigma( loss_i )_dataset1 + lambda2 * sigma( loss_i )_dataset2,
Total_loss = sigma( lambda1_i * loss_i )_dataset1 + sigma( lambda2_i * loss_i )_dataset2

Is there anyway to use the built-in Adam optimizer to optimize lambda to maximize total loss?
I implemented it by adopting Adam algorithm and it worked well for single batch or single GPU card. However, it cannot handle the multi batches or multi GPU cards.