Hi, I am working on an RL project for which I made a custom (stateful) environment which handles everything on gpu tensors. I want to make this handle batches as well in a vectorized manner, but an struggling about the exact requirements for this and cannot find any good docs.
From what I understand, I will essentially have my tensors just have a leading batch dimension as well. My question is - what happens when one of the environments ends/terminates (maybe either bounds exceeded by agents which is failure or if the goal is achieved which is success) and the others are still going? What should step return?
Any example or docs for this would be greatly appreciated
Hi Vincent, thanks for your prompt response. I’m using the syncdatacollector class for using the environments. I have one more question:
Will the “_reset” key be properly populated depending on the “done” key for all batches and self.reset() called when doing rollouts with the collector or do I need to specify this somewhere?
If I have batch size 3, “_reset” key looks something like [[True], [False], [True]] if I want to reset the 1st and 3rd entry in the batch. I would want to get these from the done, probably. Do I populate this key in the tensordict returned by my _step() function?
If the done states are set properly by your _step function, the _reset will be written where appropriate
Then it’s up to your batched env to deal with these!