[Solved] Implementation of A2C doesn't learn

Ricocotam · August 17, 2018, 7:37am

Hey everyone
I’m new in RL field so I’m re-implementing all the classical algs. I did DQN and REINFORCE but I got some troubles with A2C. I coded it from scratch and it didn’t learn (it’s even doing worse than random). I checked my code with the examples from github and I can’t spot the difference. I’ve been through it many times and I still don’t get it.
Here is my code and the example.

I have another question. Why does the example normalize ( minus mean, divided by std) the discounted rewards ?

Thanks for the help

Edit : After one hour of investigation into any value I could analyse : a misspelled variable. Gg