Hey everyone
I’m new in RL field so I’m re-implementing all the classical algs. I did DQN and REINFORCE but I got some troubles with A2C. I coded it from scratch and it didn’t learn (it’s even doing worse than random). I checked my code with the examples from github and I can’t spot the difference. I’ve been through it many times and I still don’t get it.
Here is my code and the example.
I have another question. Why does the example normalize ( minus mean, divided by std) the discounted rewards ?
Thanks for the help
Edit : After one hour of investigation into any value I could analyse : a misspelled variable. Gg