Is there a formal definition for `in_channels`?

When creating a convolution layer in Pytorch, the function takes an argument called in_channels. I am wondering if there is a formal definition of what in_channels actually means, especially in the case of the first layer, where in_channels depends on what your data looks like?

When working with grayscale and colored images, I understand in_channels is set to 1 and 3, respectively, where 3 corresponds to red, green, and blue.

Say you have a colored image that is 200x200 pixels. The standard would be to set in_channels = 3 in the first conv layer. Would it be possible to structure the input data differently, and set in_channels equal to the width or height dimension of the input image, i.e., in_channels = 200?

Technically, you can do that by reordering the dimensions of your input image such that the data has shape [N, channels, width, hight], but the question is whether you will get good results or not. In this case, then the width of your data will be very small (1 or 3 for gray-scale or colored image), so the size of kernel is also limited to the width.

Yes exactly. The reason why I posed that question was to really see what the formal definition of in_channels is. So far I haven’t found anything.

Generally, should we order the dimensions of the input image such that channels corresponds to the smallest dimension, width and height corresponds to the larger 2 dimensions? As you said, this would allow for more flexibility with kernel sizing and strides.

For actual images, I think it is intuitive what width, height, and channels should correspond to. But CNNs can also be applied to non-image data (e.g., time series data), which is what I am working with. It is not clear to me what I should be using for in_channels in my application.

for time series data, just think of how many data streams you have – those are effectively your channels.
I dont think i’ve seen a formal definition, but channels are the streams of data that are in the same dimension, that you want to learn correlations across.

You can make your width dimension to be the channels dimension if you want, and that does make sense for specific data.

1 Like

@smth. By “data streams” do you mean the discrete time intervals? e.g., say I have data at 10 discrete time intervals, your statement would mean in_channels = 10? If so, that is something that I have considered, but I am not sure if it is suitable for my application.

I am trying to build a CNN for a computational physics application. Essentially, my CNN input is temperature values. For each input sample, the temperature is 3-D spatially- and temporally-dependent. I have the temperature at M locations for each discrete time. I have N discrete times. So for each input sample, I have NM features or Nm temperature values.

I am trying to figure out how to wrangle this data so that I can feed it into the CNN. What I am thinking of doing is having a NxMx1 tensor, where N = width, M = height, 1 = in_channel. So, in this setup, the number of data streams does not correspond to in_channel. If I were to make the number of data streams correspond to in_channel, i.e., in_channel = N, then my width = 1, and height = M. So the image becomes just a column as opposed to a NxM matrix/2nd order tensor.

Because of the temporal and spatial dependency nature of temperature (i.e., temperature values are similar spatial locations more similar than at disparate spatial locations. and temperature values at similar times are more similar than at disparate times), I think the former structure, with in_channels = 1, would be more suitable.