So my main parameters are controlled by gradient descent and a small mlp is by gradient ascent. My approach was straightforward having two sets of optimizers and in the first backward pass with descent freezing the ascent parameters and next again unfreezing them for ascent.
— Step 1: Primal descent (update θ, φ) —
for p in dual_params:
p.requires_grad_(False) # freeze dual temporarily
optimizer_main.zero_grad()
total_loss = L_ret + ascent_loss # keep full gradient flow
total_loss.backward()
optimizer_main.step()
for p in dual_params:
p.requires_grad_(True) # unfreeze for next step
— Step 2: Dual ascent (update ξ) —
optimizer_dual.zero_grad()
(-ascent_loss).backward() # ascent step for dual
optimizer_ascent.step()
Got the error with this: RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
retain_graph=True gives out of memory error. Please advise me on this. I am sure this is quite common to update model’s some parameter with one loss while other with other losses.
