Im training an ensemble of networks and wanted to know if there are some possible improvements in the code that i wrote in terms of speed following this scheme:
so I have an ensemble of N(=10) networks with N target networks.
calculating the loss is quite easy but then i have to do the backward pass… currently my schedule looks like this:
# Compute critic losses and update critics for critic, optim, target in zip(self.critics, self.optims, self.target_critics): Q = critic(states, actions).cpu() Q_loss = F.mse_loss(Q, Q_targets) # Update critic optim.zero_grad() Q_loss.backward() optim.step() # soft update of the targets self.soft_update(critic, target)
Especially concerning for me is the need of also 10 optimizers, is there an easier way to do it?
Thanks a lot!