Should just be something like
critic_loss = ((reward+(gamma*critic_new.detach())) - critic_old)**2
2 Likes
Should just be something like
critic_loss = ((reward+(gamma*critic_new.detach())) - critic_old)**2