Gradient ascent on some parameters while descent on others in a single model

So my main parameters are controlled by gradient descent and a small mlp is by gradient ascent. My approach was straightforward having two sets of optimizers and in the first backward pass with descent freezing the ascent parameters and next again unfreezing them for ascent.

— Step 1: Primal descent (update θ, φ) —

for p in dual_params:
p.requires_grad_(False) # freeze dual temporarily

optimizer_main.zero_grad()
total_loss = L_ret + ascent_loss # keep full gradient flow
total_loss.backward()
optimizer_main.step()

for p in dual_params:
p.requires_grad_(True) # unfreeze for next step

— Step 2: Dual ascent (update ξ) —

optimizer_dual.zero_grad()
(-ascent_loss).backward() # ascent step for dual
optimizer_ascent.step()

Got the error with this: RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).

retain_graph=True gives out of memory error. Please advise me on this. I am sure this is quite common to update model’s some parameter with one loss while other with other losses.

Hi!

I think you can implement gradient descent + ascent in just 1 backward pass.
I would use module backward hooks if I were you, to multiply the gradient by a negative value and possibly scale its magnitude down too because gradient ascent often requires smaller update magnitudes to work properly.

For more references, you could look at the Gradient Reversal Layer ([1409.7495] Unsupervised Domain Adaptation by Backpropagation) and try to find implementations online (although it’s very easy to re-implement yourself using just a hook).

thanks checking it out. honestly I think I am not doing it properly. maybe I shouldn’t do a loss.backward for the second loss rather update it like this one here. but then I am not sure how to get that gradient of lambda_w(x).

Does L_clean depend on ω, θ, both or neither? Same question for L_robust.