I’m trying using VGG-M extract video featurs, but I have some problems.
I am supposed to use CNN extract features, the CNN ingets 5 frames sliding windows, which means the input of the network is 5 images(in grayscale). But I have downloaded the VGG-M model, in this model the input is settled for a 3 channel image(RGB). I have try to change the input dimension, but I got the error, which said the dimension mismatching. I want to ask how can I build a 5 channel convolutional filter at conv1 and then use the VGG-M model?
I tried extract 3 channel image(RGB), after model.features the output is (1,512,3,3), but I saw the code in vggm.py, if I use classif, the input should be 18432, but 51233 is not 18432, I also tried to change the dimension, but still got the problem with dimension mismatching
Thank you for your help. I greatly appreciate it.