I am currently implementing a (Pointer Network) to solve a simple Knapsack Problem. However, I am bit puzzled on the correct (or common or “best”) way to give the agent the option to stop taking item (terminate episode). Currently I have done it in 2 ways, adding raw dummy features or adding encoded dummy features (dummy features are all zeros). I trained both methods for 500K episodes and evaluate their performance on a single predefine testcase in each episode after adding the gradient. I found that concatinating dummy features with the encoded features yielded higher score earlier, but also scored 0 very often. On the other hand, adding the dummy features to the raw features learned to maximize score very slowly. Therefore, my questions are:
- Is adding the raw dummy features make the learning slower because additional encoding layer learning?
- What is the most correct (or common or arguably best) way to give the agent the option to terminate the episode (in this case stop taking item)?
Thank you