Value loss is Increasing on VPG

Hi, I’m implementing the Vanilla Policy Gradient (REINFORCE) with GAE for advantage estimation.

When I ran the vpg, the value loss increases gradually.
I suppose the value loss will decrease updating gradient to fit value function.

What do you think?

my implementation: https://github.com/yamatokataoka/reinforcement-learning-replications/blob/master/rl_replicas/vpg/vpg.py

all stats while learning.
https://github.com/yamatokataoka/reinforcement-learning-replications/files/5352419/08102020_performance_vpg_separate_optim_and_value_fn.txt