I am trying to do convolution on frames videos (like tube of videos).
So im reading video frams and make them to have the shape of
NxCinxDxHxW, where Cin = 3 (channel size), and W,H= dimension (lets say they are equal) and D is 1. and N is batch size, lets say 1 for simipilicity.
then i concatenate them, so my final output is having the size of
NxCxDxHxW, where D is the number of frames.
now i want to do 3d convolution in a way that i do convolution along the frames, like i have the input of
NxCxDxHxW and kernel
This is an example,
m = nn.Conv3d(3, 30, (6,3,3), stride=1,padding=(0, 1, 1)) input = torch.randn(1,3 , 6, 10, 10) output = m(input) output.size() torch.Size([1, 30, 3, 10, 10])
I dont get the concept of padding along D, how does it happen?
Can you please tell me how should i do it?