Hello,
I am training a SAC agent and want to resume training. However, the loss explodes after resuming training.
In SAC we have as trainable parameters: an actor (also called policy), a critic, and a parameter alpha.
Moreover, if I want to resume training I have to reload the experience replay.
I think the issue is in the alpha parameter as the loss for alpha goes from near 0 to 40 K, do I load it correctly?
def load_model(actor_path, critic_path, optimizer_actor_path, optimizer_critic_path, optimizer_alpha_path):
policy = torch.load(actor_path)
self.alpha = policy['alpha'].detach().item()
self.log_alpha = torch.tensor([policy['log_alpha'].detach().item()], requires_grad=True, device=self.device)
self.alpha_optim = Adam([self.log_alpha], lr=self.lr)
self.policy.load_state_dict(policy['model_state_dict'])
self.policy.train()
self.critic.load_state_dict(torch.load(critic_path))
self.critic.train()
self.policy_optim.load_state_dict(torch.load(optimizer_actor_path))
self.critic_optim.load_state_dict(torch.load(optimizer_critic_path))
self.alpha_optim.load_state_dict(torch.load(optimizer_alpha_path))
Additional note: when testing the policy and critic it seems to output the same values as on the previous training, the problem has to be in the alpha or optimizers.