How to apply Conv2D to [time_dim, batch_dim, C_out, H_out, W_out]?

Hi all! I’m dealing with batch of images where each item in the batch is a time sequence of data. So essentially, imagine that each entry in the batch is a set of frames. I’d like to apply convolutions to the last three channels, but of course Conv2D expects 4 dimensions.

Is the correct way to approach this problem that I combine time_dim and batch_dim (thus convert the overall input to 4 dimensions), apply the conv layer, and then split back out to 5 dimensions?

I don’t fully understand this idea. Do you want to slice the channels or to apply the conv to three dimensions?
In the latter case you could permute the input into the shape [batch_size, channels, temp, height, width] and use an nn.Conv3d layer.

Essentially I have a batch of sequences of images (where the image dimensions are the last three channels). Conv2D wants a 4 dimensional input but I have 5 dimensions. The time dimension isn’t a depth dimension so I don’t want a 3D convolution (i.e. these aren’t 3D images or point clouds, they’re just sequences of 2D images).

What I’m wondering is, can I combine the batch and time dimensions, then apply a Conv2D as normal, and then split those dimensions back out while preserving the original time sequence ordering?

Yes, of you want to treat each time step as a separate sample you can flatten the temporal dimension into the batch dimension. However, you are still mentioning channels while I assume you mean dimensions?