I am currently been able to train a system using Q-Learning. I will to move it to Actor_Critic (A2C) method. Please don’t ask me why for this move, I have to.
I am currently borrowing the implementation from https://github.com/higgsfield/RL-Adventure-2/blob/master/1.actor-critic.ipynb
The thing is, I am keep getting a success rate of approx ~ 50% (which is basically random behavior). My game is a long episode (50 steps). I am wondering how should I debug this. Should I print out the reward, the value, or what? How should I debugg this?
Here are some log:
simulation episode 2: Success, turn_count =20 loss = tensor(1763.7875) simulation episode 3: Fail, turn_count= 42 loss = tensor(44.6923) simulation episode 4: Fail, turn_count= 42 loss = tensor(173.5872) simulation episode 5: Fail, turn_count= 42 loss = tensor(4034.0889) simulation episode 6: Fail, turn_count= 42 loss = tensor(132.7567) loss = simulation episode 7: Success, turn_count =22 loss = tensor(2099.5344)
As a general trend, I have observed that for Success episodes, the loss tends to be huge, where as for Fail episode, the loss function output tends to be small. Any suggestion?