How can I process stack of frames

dato_nefaridze · October 26, 2022, 5:17pm

Hi, I am building a reinforcement learning environment and have a question about: How can I process stack of frames? I plan to do it with Transformers, but the problem is I don’t know how to summarize frames into one state vector, what I mean is: I am doing frame processing by using pre-trained model, after that i plan to input those features into transformer network so that I get one vector which summarizes sequence of frames

desert_ranger · October 26, 2022, 10:27pm

Usually function approximators like CNNs and Transformers take in a stack of frames as input. Therefore, you don’t need to do any pre-processing.

dato_nefaridze · October 27, 2022, 6:07am

What i meant was: i am featurizing images into vectors so that’s why i want to use resnet, it’s necessary. I am doing same project as CURL (contrastive unsupervised reinforcement learning).
input is: [B, C, F, H, W] batch size, channels, frames, height and width

vmoens · November 3, 2022, 1:32pm

Hey @dato_nefaridze can you provide a code snippet of some (even buggy) implementation of what you’re trying to do? I’m struggling to understand what it is…