How to combine temporal and spatial dimensions?

How to convert batch of videos containing image sequences, where shape of each batch is -

(batch_size, 3, num of images in a video, height of image, width of image)

and I want to convert it into -

(batch_size, 3 * num of images in a video, height of image, width of image)

so its like combining 1st and 2nd dimensions…

Also I don’t want the reshaping procedure to be dependent on batch_size.

You could use the view operation as:

N, C, T, H, W
x = torch.randn(B, C, T, H, W]
x = x.view(x.size(0), -1, x.size(3), x.size(4))