I have a neural network that starts with some convolutional layers, then an LSTM layer and finally some deconvolutional layers. Kind of encoder-decoder architecture with LSTM in the middle.
A sample in my dataset is a sequence of 4 images with shape [4, 3, H, W]. So, when I want to use batches, with batch_size=8 for example, the resulting tensor would have shape [8, 4, 3, H, W].
The problem is that Conv2d layer does not accept 5-dimensional tensors, it expects 4-dimensional tensors.
I’ve tried to reshape the tensor like this: x.view(-1, 3, H, W), resulting this shape: [32, 3, H, W]. But this is not correct for me, as sequences are not related to each other, since they are picked randomly. So sequence 1 (4 contiguous images) in the batch is not the previous to sequence 2 (other 4 contiguous images) in this batch.
How can I feed my network with this 5-dimensional tensor in order to keep the batch information and also the sequence information?
The CNN will only look at each image separate from the others, so it won’t look at them in a sequence in any way unless you would concatenate them in some way, like along the channels or something like that. It’s a little unclear what you’re trying to accomplish but from my understanding you want the CNN to do some feature extraction from the images and then you can use the lstm on those features.
So what I’m thinking you could do is what you said which is process each image separate by doing x.view(batch_size*4, 3, H, W) and then when each image has been processed by the CNN you will have shape (batch_size*4, C, H_new, W_new) and you can reshape this to the lstm by doing x.view(batch_size, 4, -1) if you’re using batch_first=True for the lstm. The lstm will then process them in a sequence.