Question 1: Which way should I follow for training a sequence of image with CNN-LSTM architecture? Should I extract all sequential image features by CNN and pass those features to LSTM or should I feed one image with one CNN and one LSTM?
Let we will use a fixed length LSTM
way 1: a = CNN(image1), a = CNN(image2)…so on final_output = LSTM(a)
way 2: hid1 = LSTM1(CNN(image1)), hid2 = hid1+LSTM2(CNN(image2)) and so on
Do you have any optimized implementation of sequential CNN-RNN implementation where sequence of image is used as CNN input. I tried to implement it with VGG16 net. But its getting out-of-memory.