How to optimize the rl?

On text summarization optimization I want to optimize the reinforcement learning algorithm

So what have you tried so far and where are you struggling?

there is actor-Critic Models with Policy Gradient ,but I do not know what can be done to make it coverage more quickly and make sure the quality of the generated summary is better