How to understand the D ? in (N, C, D, H, W)?
let’s say for example I have five video frames and I stack the frames along the channel dimension giving me :
a (1, 15, H, W) tensor assuming RGB frames. How do I reshape this tensor to (N, C, D, H, W)