Hi all, I’m trying to accomplish an event detection task with 2 models where model_a will produce the event tag and model_b will produce the event localization (aka label at each time frame). Basically, I’m trying to do update the two models according to the combined loss of these two models.

I tried the following code which run successfully but I am not entirely sure about the loss.backward() part:

model_a = tkmodel_a()

model_a = tkmodel_b()

criterion = nn.BCELoss() #binary cross entropy

optimizer_a = optim.Adam(model_a.parameters(), lr=0.001, betas=(0.9, 0.999),

eps=1e-08, weight_decay=0., amsgrad=True)

optimizer_b = optim.Adam(model_b.parameters(), lr=0.001, betas=(0.9, 0.999),

eps=1e-08, weight_decay=0., amsgrad=True)

if torch.cuda.is_available():

model_a.cuda()

model_b.cuda()

“Function to do prediction and etc”

loss_a = criterion(predicted_a, event_label)

loss_b = criterion(predicted_b, localization_label)

combined_loss_a = loss_a + loss_b

combined_loss_b = loss_a + loss_b

optimizer_a.zero_grad(retain_graph=True)

combined_loss_a.backward()

optimizer_a.step()

optimizer_b.zero_grad()

combined_loss_b.backward()

optimizer_b.step()

I have read on several posts that there is no need to do it this way and I can actually use

combined_loss = loss_a + loss_b

combined_loss.backward()

and based on the discussion at Optimizing based on another model's output if i do the following

optimizer_strong.zero_grad()

optimizer_weak.zero_grad()

combined_strong_loss.backward()

optimizer_strong.step()

optimizer_weak.step()

I would be updating both model_a and model_b with the combined_loss by using backward only once. But is there anyone who can confirm this?