nn.Conv2d
expects an input in the shape [batch_size, channels, height, width]
while you are passing a 5-dimensional input as given in the error message.
Remove the unnecessary dimension and permute
the input to have the aforementioned channels-first memory layout.
PS: you can post code snippets by wrapping them into three backticks ```.
You can either reduce the dimension as @ptrblck says, or if available for your use case use the conv3d instead of the 2d counterpart.
You can use the .reshape() method to flatten the data. Specifically, if you have a tensor with dimensions [batch_size, num_frames, channels, height, width], you can use .reshape() to turn it into a tensor with dimensions [batch_size * num_frames, channels, height, width]. This means all batch and frame sizes will be stacked together to form a single large batch. Result should be : ([1, 374, 402, 3])