Dear community, could you tell me, what is going wrong, if my reward is getting worse over time and the loss is increasing?
I am using a simple dueling network architecture with linear layers.