Actor Critic Implementation Problem, Rewards

Hi,
I am trying to implement a A2C plain and simple. I tried but not able find why the policy network in not getting better (rewards are not getting any better).

Not able to figure out whether the reward functions not correct or may be I am destroying gradient somewhere so that network is not able to train. Could any one help me to find the error in my implementation?

Github Project Link

My Agent action function breakout-forum.py – Line 95

My Train function breakout-forum.py – Line 156

Any help is appreciated.

Thanks