How to represent "terminate episode" for Knapsack problem with Pointer Network?

gemsanyou · September 2, 2021, 5:58am

I am currently implementing a (Pointer Network) to solve a simple Knapsack Problem. However, I am bit puzzled on the correct (or common or “best”) way to give the agent the option to stop taking item (terminate episode). Currently I have done it in 2 ways, adding raw dummy features or adding encoded dummy features (dummy features are all zeros). I trained both methods for 500K episodes and evaluate their performance on a single predefine testcase in each episode after adding the gradient. I found that concatinating dummy features with the encoded features yielded higher score earlier, but also scored 0 very often. On the other hand, adding the dummy features to the raw features learned to maximize score very slowly. Therefore, my questions are:

Is adding the raw dummy features make the learning slower because additional encoding layer learning?
What is the most correct (or common or arguably best) way to give the agent the option to terminate the episode (in this case stop taking item)?

Thank you