Actor Critic Implementation Problem, Rewards

I am trying to implement a A2C plain and simple. I tried but not able find why the policy network in not getting better (rewards are not getting any better).

Not able to figure out whether the reward functions not correct or may be I am destroying gradient somewhere so that network is not able to train. Could any one help me to find the error in my implementation?

Github Project Link

My Agent action function – Line 95

My Train function – Line 156

Any help is appreciated.