Suppose I have loss = w1loss1 + w2loss2, where I defined two learnable weights for each loss
Weightloss1 = torch.FloatTensor([1]).clone().detach().requires_grad_(True)
Weightloss2 = torch.FloatTensor([1]).clone().detach().requires_grad_(True)
opt1 = torch.optim.Adam(model.parameters(), ...)
opt2 = torch.optim.Adam([Weightloss1, Weightloss2], ...)
# training
While True:
model.train()
for X, Y in train_set:
pred_Y = model(X)
loss_1 = Weightloss1 * model.loss_fn_1(pred_Y, Y)
loss_2 = Weightloss2 * model.loss_fn_2(pred_Y, Y)
loss = torch.div(torch.add(loss_1,loss_2), 2)
opt_1.zero_grad()
opt_2.zero_grad()
loss.backward(retain_graph=True)
opt_1.step()
opt_2.step()
My question is, when I call loss.backward(retain_graph=True), will pytorch calculate gradients w.r.t. w1 and w2 in addition to model parameters? If so how can I get access to them?
Also, does the order of updating step() call matter? I do not believe so though