CNN-RNN implementation for video classification in sequetial image

Question 1: Which way should I follow for training a sequence of image with CNN-LSTM architecture? Should I extract all sequential image features by CNN and pass those features to LSTM or should I feed one image with one CNN and one LSTM?

Let we will use a fixed length LSTM
way 1: a[0] = CNN(image1), a[1] = CNN(image2)…so on final_output = LSTM(a)
way 2: hid1 = LSTM1(CNN(image1)), hid2 = hid1+LSTM2(CNN(image2)) and so on

Question 2:
Do you have any optimized implementation of sequential CNN-RNN implementation where sequence of image is used as CNN input. I tried to implement it with VGG16 net. But its getting out-of-memory.

Hi Sohel.

I’m starting on the same task (seq. of images into a CNN-LSTM model).
Did you solve your problems and got a working model? Or does anyone have a starting point/good link?