Hi, I’m implementing the Vanilla Policy Gradient (REINFORCE) with GAE for advantage estimation.
When I ran the vpg, the value loss increases gradually.
I suppose the value loss will decrease updating gradient to fit value function.
What do you think?
my implementation: https://github.com/yamatokataoka/reinforcement-learning-replications/blob/master/rl_replicas/vpg/vpg.py
all stats while learning.
https://github.com/yamatokataoka/reinforcement-learning-replications/files/5352419/08102020_performance_vpg_separate_optim_and_value_fn.txt