Best practice for integrared vs separate optimizers in actor-critic models

Generally speaking, when we have two networks for Actor and Critic, namely actor_net and critic_net (with no shared modules or layers), which way (described below) is better to perform the optimization in terms of best practices and standards?

1. Separate:

class ActorCritic:
    def __init__(*args):
        ...
        self.actor_optim = Adam(self.actor_net.parameters(), lr=self.lr_actor)
        self.critic_optim = Adam(self.critic_net.parameters(), lr=self.lr_critic)
        ...

    def learn(*args):
        for step in total_steps:
            ... # collect rollout
            for update in num _updates_per_step:
                ... # forward pass
                actor_loss = ...
                critic_loss = nn.MSELoss(..., ...)

				# backprop for actor network
				self.actor_optim.zero_grad()
				actor_loss.backward(retain_graph=True)
				self.actor_optim.step()

				# backprop for critic network
				self.critic_optim.zero_grad()
				critic_loss.backward()
				self.critic_optim.step()
                ...

or

2. Integrated:

class ActorCritic:
    def __init__(*args):
        ...
        self.optim = optim.Adam([
            {'params': self.actor_net.parameters(), 'lr': self.lr_actor},
            {'params': self.critic_net.parameters(), 'lr': self.lr_critic}
        ])
        ...

    def learn(*args):
        for step in total_steps:
            ... # collect rollout
            for update in num _updates_per_step:
                ... # forward pass
                actor_loss = ...
                critic_loss = nn.MSELoss(..., ...)

                # calc total loss
                total_loss = actor_loss +  critic_loss

                self.optim.zero_grad()
                total_loss.backward()
                self.optim.step()
                ...

First of all, are both logically correct? And if yes, does make a difference which to use? and if so, which is preferred?

I haven’t experimented on it yet, but plan to do so. I wanted to see what the community’s opinion is on it.

In torchrl we made some effort to use (2) instead of (1) unless the optimizers had to be called separately (eg, you need 2 different classes or one is called more often than the other)
Example here