Value loss is Increasing on VPG

Hi, I’m implementing the Vanilla Policy Gradient (REINFORCE) with GAE for advantage estimation.

When I ran the vpg, the value loss increases gradually.
I suppose the value loss will decrease updating gradient to fit value function.

What do you think?

my implementation:

all stats while learning.