If you’re not familiar with an OpenAI Gym it is an agent in a Markov Decision Process that returns data of the observation, the reward, whether the game is over, and some other info. Is there an established / optimal way of dealing with this in LibTorch?
I have a guess at the end of this post, but I’d like to hear any ideas.
I don’t need the C++ version of Gym, which I’m pretty sure doesn’t exist. I’m going to make my own Gym but I don’t know how to store and retrieve this kind of data.
This is typical of what you get back after each ‘step’
(array([ 0.02957329, 0.20411213, -0.04598045, -0.2613098 ], dtype=float32),
1.0,
False,
{some info})
This is just what the Gym returns after each step. I need more than this. I need to create large buffers that hold the current state, action, reward, done, and next state.
I have NumCpp (a C++ implementation of Numpy), but I don’t think NumCpp and Libtorch work together like PyTorch and Numpy do on Python. What would I include to work with?
My solution might be to just make a large tensor for everything. A large, long tensor for each state, new state, action, reward, and dones.
i.e. the new states part of the buffer will be a large tensor and I will add each new new state with torch::cat
and call all five tensors with the same index number when recalling them or deleting them. I think there’s no way for this to be slow. What do you think?