How to sample transitions in vectorized envs for off-policy algos

kbtorcher · August 21, 2023, 1:17pm

How is minibatch sampling for vectorized envs (consider sequential processing of envs) with off policy RL implemented? Let’s say there are 5 envs. Every step the agent gets 5 transitions and pushes it to the buffer. While doing critic and policy updates, do we randomly sample transitions of batchsize N or do we randomly sample timesteps of batchsize N, thereby effectively giving us N*5 transitions?

J_Johnson · August 23, 2023, 11:57am

Ideally random, if they are stored in the memory buffer.

It really makes no difference to the model what order you are feeding it the data, unless your model contains a memory state. Then you would definitely need the data fed sequentially.