How to optimize model by combining multi-task losses

Max.T · December 23, 2020, 3:46am

Hi, I am new to pytorch and not quite sure if I am doing right. Please tell me whether the following is right or wrong.

What I am doing:
I am training a GAN. But the generator is a model that learn regression from some prepared input (not random noise). Let’s say we have two model blocks: generator (G) and discriminator (D), and three losses: GAN loss for discriminator (d_loss), GAN loss for generator (g_loss), and regression loss for generator (mse_loss). I would like to update G by combining g_loss and mse_loss.

My question:
It is straightforward to optimize D model, which use GAN loss for discriminator. However, I am not quite sure whether I am doing right. My code is like:

input = ...
regress_label = ...
G = generator()
D = discriminator()
D_optimizer = torch.optim.Adam(D.parameters())
G_optimizer = torch.optim.Adam(G.parameters())
# train G part
g_out = G(input)
d_g_out = D(g_out)
g_loss = loss(d_g_out, tell_D_real)
mse_loss = mse(g_out, regress_label)
# combine GAN loss and regression
total_g_loss = mse_loss + g_loss     # can I just sum them up ???
total_g_loss.backward()
G_optimizer.step()    # update G only relying on optimizer ???

Thank you very much !

ptrblck · January 5, 2021, 6:26am

The code looks correct. Note that lotal_g_loss.backward() would also calculate the gradients for D (if you haven’t set all requires_grad attributes to False), so you would need to call D.zero_grad() before updating it.

Max.T · January 20, 2021, 12:22am

@ptrblck Thank you very much!

I am thinking different architecture of multi-task network should have different BP approach. Here are 2 conditions:

Condition 1:
G(encoder → decoder) → loss_G
G(encoder → decoder) → D → loss_D
In this condition, we should sum loss_G and loss_D up and do BP to update G.

Condition 2:
G(encoder → decoder) → loss_G
G(encoder) → G(decoder) → loss_G
|_____> D - >loss_D

In this condition, the two tasks like two branches of a tree. Only the G(encoder) is the common part. The two losses should be BP to update G(encoder). But for G(decoder) part, only loss_G should be used.
In this case, can we still feed loss_D + loss_G to optimizer?

Thank you very much!

ptrblck · January 20, 2021, 12:31am

I’m not quite sure about the second approach, since loss_G is calculated twice.
Would the G(encoder → decoder) → loss_G pipeline be still used or only G(encoder) → G(decoder) → loss_G to calculate loss_G?

Max.T · January 22, 2021, 6:22am

Sorry for the confusion. In fact, I just want to know how to do BP in multi-task learning in pytorch.

Just forget about the GAN, let’s say the architecture is:

block A - > block B → loss_B
|_____> block C - >loss_C

the model parameters include block A, B and C. optimizer is torch.optim.Adam(model.parameters()), and total loss = loss_B + loss_C. Is this correct to train the multi-task learning model?

Thank you.

ptrblck · January 22, 2021, 6:29am

Yes, your approach looks correct. Note that:

loss = loss_B + loss_C
loss.backward()

would accumulate the gradients in block_A, since both forward passes used the parameters in it. I assume that fits your use case and is expected.