Loss exploding after resume training SAC

deepfailure · April 29, 2021, 10:17am

Hello,

I am training a SAC agent and want to resume training. However, the loss explodes after resuming training.

In SAC we have as trainable parameters: an actor (also called policy), a critic, and a parameter alpha.

Moreover, if I want to resume training I have to reload the experience replay.

I think the issue is in the alpha parameter as the loss for alpha goes from near 0 to 40 K, do I load it correctly?

def load_model(actor_path, critic_path, optimizer_actor_path, optimizer_critic_path, optimizer_alpha_path):

  policy = torch.load(actor_path)
  self.alpha = policy['alpha'].detach().item()
  self.log_alpha = torch.tensor([policy['log_alpha'].detach().item()], requires_grad=True, device=self.device)
  self.alpha_optim = Adam([self.log_alpha], lr=self.lr)

  self.policy.load_state_dict(policy['model_state_dict'])
  self.policy.train()
  self.critic.load_state_dict(torch.load(critic_path))
  self.critic.train()

  self.policy_optim.load_state_dict(torch.load(optimizer_actor_path))
  self.critic_optim.load_state_dict(torch.load(optimizer_critic_path))
  self.alpha_optim.load_state_dict(torch.load(optimizer_alpha_path))

Additional note: when testing the policy and critic it seems to output the same values as on the previous training, the problem has to be in the alpha or optimizers.