Running conv2d on tensor [batch, channel, sequence, H,W]

Hi, I am working on a video frame data where I am getting input data as tensor of the form [batch,channel,frame_sequence,height, weight] (let denote it by [B,C,S,H,W] for clarity. So each batch basically consists of a consecutive sequence of frame. What I basically want to do is run an encoder (consisting of several conv2d) on each frame ie each [C,H,W] and get it back as [B,C_output,S,H_output,W_output]. Now conv2d expects input as (N,C_in,H_in,W_in) form. I am wondering what’s the best way to do this without messing up the order within the 5D tensor.

So far I am considering following way of thoughts:

>>> # B,C,seq,h,w
... # 4,2, 5,  3,3 
... 
>>> x = Variable(torch.rand(4,2,5,3,3))
>>> x.size() 
torch.Size([4, 2, 5, 3, 3])
>>> x = x.permute(0,2,1,3,4)
>>> x.size() #expected = 4,5,2,3,3 B,seq,C,h,w
torch.Size([4, 5, 2, 3, 3])
>>> x = x.contiguous().view(-1,2,3,3)
>>> x.size()
torch.Size([20, 2, 3, 3])

And then run conv2d (encoder) on the updated x and reshape it. But I think It wouldn’t preserve the original order of tensor. So, how can I achieve the goal?