I am working on the segmentation of a video of a person walking. My goal is to segment the body parts of the person.
I have used a U-Net model to perform segmentation on each individual frame. It works well for segmenting the person from the background, but not for segmenting their individual parts. I want to see if I can improve results by leveraging the temporal order of the images.
I’m wondering how to organize the images into sequences for a 3D CNN. Should I use the same number of images in each sequence, say five? Then I could have a batch of 50 sequences each containing five consecutive images. Or would it be better to randomly select the number of images for each sequence? Also, should I shuffle the sequences in the batch?