Passing variable number of frames to my LSTM layer

I’m trying to make a video classification model. I pass 1-N frames from a single video. The number of frames from a single video, depends on the size of video. I want to be able to send variable number of frames for a single video. The problem is, how to pass variable number of frames to my LSTM layer. Is padding the only solution for it (I really don’t want to go for padding since the length of videos is ranging from 15 seconds to 6 mins or even more). I also don’t want to keep my batch size constant (i.e:batch size=1).
Please help

The typical solution is to group the videos by length to get buy with as little padding as possible. There are various methods of implementing this “bucketized” sampling, TorchText someting like it, I sometimes implement something like it by grouping them in the dataset and then returning batches of data from the dataset (and using batch size 1 in the dataloader and squeezing the extra dimension).

Best regards


@tom can you please specify how to apply the padding operation on videos. I do not understand how padding is supposed to help resolve my problem.