I have fine tuned a pre-trained model on multiple task.
Optimising the model with multiple loss functions for different tasks
how these approach are different in optimising the model.
Approach 1: → L = L1 + alphaL2 + beta * L3; back propagating weighted total loss
Approach 2: → back propagating losses separately like, L1, alphaL2 and then beta*L3
kindly explain these two way of back propagating losses
Both will yield the same results but the second approach would call backward
multiple times and would thus add some runtime overhead.
2 Likes
thank you for the reply.
I have fine tuned the model with both approach compromising the runtime overhead for second approach but i get different results. what could be the possible reason?
I have back propagated loss in approach_2 like this →
loss_1.backward(retain_graph=True)
loss_2.backward(retain_graph=True)
loss_3.backward()
I don’t knot but would recommend to compare the actual loss values first, then the gradients etc. If you get stuck, minimize the code and post an executable code snippet reproducing the issue.