How would you suggest storing OpenAI Gym style data?

Alio · August 22, 2022, 12:05am

If you’re not familiar with an OpenAI Gym it is an agent in a Markov Decision Process that returns data of the observation, the reward, whether the game is over, and some other info. Is there an established / optimal way of dealing with this in LibTorch?
I have a guess at the end of this post, but I’d like to hear any ideas.

I don’t need the C++ version of Gym, which I’m pretty sure doesn’t exist. I’m going to make my own Gym but I don’t know how to store and retrieve this kind of data.

This is typical of what you get back after each ‘step’

(array([ 0.02957329,  0.20411213, -0.04598045, -0.2613098 ], dtype=float32),
 1.0,
 False,
 {some info})

This is just what the Gym returns after each step. I need more than this. I need to create large buffers that hold the current state, action, reward, done, and next state.

I have NumCpp (a C++ implementation of Numpy), but I don’t think NumCpp and Libtorch work together like PyTorch and Numpy do on Python. What would I include to work with?

My solution might be to just make a large tensor for everything. A large, long tensor for each state, new state, action, reward, and dones.
i.e. the new states part of the buffer will be a large tensor and I will add each new new state with torch::cat and call all five tensors with the same index number when recalling them or deleting them. I think there’s no way for this to be slow. What do you think?

ptrblck · August 22, 2022, 12:36am

Based on the current output it seems you are dealing with numpy arrays, floats, bools, and some other objects. I don’t know how you are implementing your custom Gym in C++, but I assume you can deal with tensors. If so, then storing these tensors should work. Concatenating them via torch::cat sounds also reasonable. However,

this might not be entirely true. Concatenating tensors and increasing their size would need to re-allocate new memory in each iteration:

x = torch.randn(10)
for _ in range(10):
    x = torch.cat((x, torch.randn(10), dim=0))

The better approach might be to store the new tensors in e.g. a list (in Python) or a std::vector (in C++) and create the final tensor afterwards.

Alio · August 22, 2022, 4:22pm

Can only one type go in a std::vector?
I suppose I could just append the reward and the bool as a value in the tensor.

ptrblck · August 22, 2022, 5:06pm

If you plan to create a single output tensor from the content of the std::vector you should not mix the types inside it as it will fail.
Also, make sure the tensors have the right shape so that cat or stack can indeed create the single tensor afterwards.
“Appending” the reward and bool values to a tensor might work in case you could expand the tensor shape and want to slice it after loading.