Optimize a network using gradients backwarded from the other

I’m using Reinforcement Learning to train two networks, one is an actor network, noted as A, and the other is a critic network, noted as Q.

The actor network A receives an input, and produces an actor tensor. like, A(s) -> a.
while Q receives the same input, plus the actor tensor a which is A produced in above step. and produces a score.
Like Q(s, a) -> q

And now I’m doing back propgation on these two networks. To optimize A, the formula is simple:

        ce_optimizer = torch.optim.Adam(tn.critic_e.parameters(), lr=tn.LR_C)
        ae_optimizer = torch.optim.Adam(tn.actor_e.parameters(), lr=tn.LR_A)
        ... # optimize the critic network, which is easy

       # optimize the  actor network
        ce_optimizer.zero_grad()  # since we optimized critic network in above part,
        a_pred = tn.actor_e(patch, hmap)
        q_eval = tn.critic_e(patch, hmap, a_pred)
        ae_optimizer.step()  # note using optimizer for actor network

Is the logic here correct? Do one forward process on A then Q, and back propgate that, once A gets its grads, simply using corresponding optimizer to do step.