Hi all!
I’m relatively new to PyTorch and I have a CNN that can predict the next frame of image based on 1 input frame relatively well. Now I want this model to outputs 1 image based on a sequence of images (e.g. give it 2 frames, make it guess the next frame). But I’m not sure how to represent the data in a way that makes sense in PyTorch. Is there any known way to do this?
I’ve considered concatenating the input images along some dimension (height/width) but that doesn’t make intuitive sense, and I need to do extra shenanigans to make the output dimension work (currently input and output are both single images so they have the same dimension, but if input is multiple images stacked together that’ll not hold).
I’ve also thought about concatenating the input images along the channel, so, for example, 3 RGB images will now have 9 channels and the output which is 1 image will have 3. This, again, doesn’t make intuitive sense to me, and I’m also not sure how to do this in PyTorch.
Thank you in advance for any helpful input!