Preparing dataset for video analyzing

Hi every one,
i’ve done several projects on deep learning based image processing (classification,…) but i am new with video processing in pytorch.
I want to analyze video sequences, for example activity recognition and i have faced problems in preparing dataset for this task .
I have a training set of different actions in sub folders and i want to produce a sliding window (ex: 10 sequence of frames) to feed to the model ( i want my input size to be 10227227). how can i do this.
the frames order is important and i don’t want the frames to be got randomly!!

Have a look at this post.
In this topic the user was dealing with different activities performed by different persons.
Using a custom sampler, you could use a sliding window approach by providing the “invalid” frame indices, where the window should not grab images from.

Would the code example be suitable as a starter code?

PS: Tagging certain people might discourage others to answer in your thread. :wink:

1 Like

thank you very much .
i did as you recommend .
I first found that post which was quite close to mine, but how about defining a dataset class?!

Do you want to feed only 10 frames from each video to the model?

thank you very much for helping.

i have a training set composed of 13 activity subfolders for example, and each activity sub folder contains 200 frames .
i want to feed my data to the model as this:

batch0: img0,img1,…img9
batch1: img1,img2,…img10
batch n: img(n) , img (n+1),…imge(n+9)
for each class of activities
how should i define the dataset class?

Got it, thank you for explanation.
I dove into this thread cause I’m working with some kind of action recognition. But in my case I’m feeding network with a whole sequence at a time. As for you, I almost sure you should look towards Sampler.

1 Like

Thank you very much for all, for your great comments and help
i really appreciate it, im trying to dive into sampler to get the comments precisely and solve it

can you please explain me to how we use collate_fn in dataloader. because I have 20 sentences for one image and need to select a random sentence for train decoder.

Late, but this can help to whoever still needs it.