Actor network loss in Proximal Policy Optimization

Hi everyone,

I have implemented a PPO algorithm where the actor and the critic are completely different networks. During training there are some times where the actor loss is getting negative. Loss is considered “optimal” when it is equal to 0, meaning that the network could not do better than that. How negative actor loss can be explained logically and if its not weird, should my goal be to get the most negative loss possible and not just 0?

As for the critic loss, the minimum loss is always >= 0.

Thank you in advance!