My Pytorch Reinforcement learning AI doesn't react to reward

PringleOrange · May 8, 2022, 5:02pm

Hello. I have this python pytorch reinforcement learning AI. It’s goal is to choose any random character from a-z and if it chooses ‘d’ it gets rewarded. it isnt rewarded or punsihed for anything else. it currently just chooses a random letter from a-z and constantly chooses that, the only time it isnt sticking with the same character is when its epsilon value forces it to choose one at random but no matter how many times it stumbles on ‘d’ it doesn’t react to the reward it was given. Below, there is a link attatched which takes you to the code which calculates what move it does based on its reward and gamma and such. Is there anything here which may be the reason why the bot won’t choose the rewarding character?

https://paste.pythondiscord.com/hekedehepe

InnovArul · May 8, 2022, 7:52pm

target = pred.clone()
for idx in range(len(d  one)):
       Q_new = reward[idx]
       if not done[idx]:
           Q_new = reward[idx] + self.gamma * torch.max(self.model(next_state[idx]))
  
       target[idx][torch.argmax(action[idx]).item()] = Q_new

When calculating the target, I guess you have to use the model with eval() mode (if there is batchnorm involved) and under torch.no_grad(). Also, while cloning pred to target variable, you may have to detach it?

target = pred.detach().clone()
with torch.no_grad():
     for idx in range(len(done)):
         Q_new = reward[idx]
         if not done[idx]:
             Q_new = reward[idx] + self.gamma * torch.max(self.model(next_state[idx]))

         target[idx][torch.argmax(action[idx]).item()] = Q_new

PringleOrange · May 8, 2022, 10:38pm

Hello. Thank you for your response although that didn’t seem to affect the output. I’ve attatched the other 2 files which use model.py and have all of the logic for what it outputs incase that’s what is causing the error.
https://paste.pythondiscord.com/rowotaboho
https://paste.pythondiscord.com/pujewikiyi