I am working on video classification for motion recognition.

I selected 10 frame from video and applied optical flow these sequantial frames.

After that, I found x and y direction of each flows and stack them together.

Finally, I have 3D matrix with shape H x W x 20.

H = height

W = weight

20 = flows from 10 frame (2 times because of x and y dimension of optical flow).

So, I want to apply 3D convolution layer this matrix above.

When I look at pytorch documentation for 3D convolution, I saw 5 dimensional input like that (N,C,D,H,W).

But my input is 4D dimensional like that (N,C,H,W) with N samples.

So, How can I apply 3D convolutional to my matrix?

N is the batch size, the number of sample. In your case, sample != single frame. Your sample is HxWx1x20., (c=1, you have only one channel) .

To map that to (N,C,D,H,W). C=1, D=20, H=H, and W=W. N depends on your batch size, and if it fits in memory or not.

Thanks a lot for reply.

I want to ask one more question.

As @ebarsoum mentioned, I have HxWx1x20 numpy uint8 array.

I want to implement a dataset class that inherit torch.utils.data.Dataset.

In **getitem** part of the function, I return this array by applying transforms like:

transforms.RandomCrop,

transforms.RandomHorizontalFlip(),

transforms.ColorJitter(),

transforms.ToTensor(),

transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]).

To be able apply the first three of them requires PiL image.

How can handle this ?