Data loader for 3D convolution on video frames

Soum · May 13, 2020, 1:28pm

I am working with a dataset of videos, I have done a preprocessing to split each video into a set of frames. I am implementing a network that uses a 3dconvolution where I have to pass chunks of videos to be processed by the network.
I know that a 3dConv layer takes a tensor of 5D (N, C, D, H,W). How do I modify the data loader so that I get the data in the appropriate shape?

aradhyamathur · May 13, 2020, 5:56pm

Say you have a set of D frames in every video of size h x w, so for every batch you first fetch the frames ,assuming that each frame is RGB image, each frame is of size h x w x 3, now stack the frames to get the shape D x h x w x 3, now permute the along the axis namely T (D x h x w x 3) -> T.permute(3,0,1,2) , the batch size will be set by the dataloader itself.

Soum · May 14, 2020, 4:50pm

I can do that but I want to fetch batches which already are composed of chuncks of frames. Meaning that each example in the batch has a depth = a number of frames , Height and width

aradhyamathur · May 14, 2020, 6:01pm

You can do this by overriding the __getitem__ in a custom implementation of dataset, where you load the frames and perform the above reshaping. You can refer here for the implementation of custom dataset and dataloader.

Varun_Tirupathi · November 14, 2020, 3:20am

Hi Soum,

i am very happy to see your post that you have worked on the video processing tasks fortunately I’m also working on the video processing task using pytorch. Could you please tell me how you wrote the data loader for 3D input videos?