How to understand temporal, spatial and volumetric words

When you deal with tensor(s), I know spatial coordinates are X, Y, and Z if it is present. I also know the batch size is not temporal.

Are image channels volumetric or temporal coordinates because both has sense to me.

I would claim it depends more on the operations you are applying to the input than the input itself.
E.g. in a conv layer the channel dimension could be seen as “volumetric” as each conv filter would use all input channels in each window position and would not “slide” through it.
However, you could certainly reshape the input and pass the channels as the “temporal” dimension to an RNN. I don’t know if it makes sense for your use case, but nothing stops you from treating the data as you wish :wink: