I am training a SAC agent and want to resume training. However, the loss explodes after resuming training.
In SAC we have as trainable parameters: an actor (also called policy), a critic, and a parameter alpha.
Moreover, if I want to resume training I have to reload the experience replay.
I think the issue is in the alpha parameter as the loss for alpha goes from near 0 to 40 K, do I load it correctly?
def load_model(actor_path, critic_path, optimizer_actor_path, optimizer_critic_path, optimizer_alpha_path): policy = torch.load(actor_path) self.alpha = policy['alpha'].detach().item() self.log_alpha = torch.tensor([policy['log_alpha'].detach().item()], requires_grad=True, device=self.device) self.alpha_optim = Adam([self.log_alpha], lr=self.lr) self.policy.load_state_dict(policy['model_state_dict']) self.policy.train() self.critic.load_state_dict(torch.load(critic_path)) self.critic.train() self.policy_optim.load_state_dict(torch.load(optimizer_actor_path)) self.critic_optim.load_state_dict(torch.load(optimizer_critic_path)) self.alpha_optim.load_state_dict(torch.load(optimizer_alpha_path))
Additional note: when testing the policy and critic it seems to output the same values as on the previous training, the problem has to be in the alpha or optimizers.