Understanding Convolution 1D output and Input

Well, not really. Currently you are using a signal of shape [32, 100, 1], which corresponds to [batch_size, in_channels, len].
Each kernel in your conv layer creates an output channel, as @krishnavishalv explained, and convolves the “temporal dimension”, i.e. the len dimension.
Since len is in your case set to 1, there won’t be much to convolve, as you basically passed a single time stamp with 100 channels.
Try to think about your signal as a sound source. In a simple use case you would have 2 channels (left and right) and a certain length, e.g. 1000 time stamps. Your input would thus have the shape [batch_size, 2, 1000].
Now if you setup a conv layer, you would have to use in_channels=2 and an arbitrary number of out_channels. Remember, the out_channels just define the number of kernels. Each kernel is applied separately on the input.
The kernel size defines, how much of the temporal dimension is used in a sliding window fashion.
E.g. if you set kernel_size=5, 5 time stamps will be used for the convolution for each position.

In your use case, however, we only have one single time stamp, so that you could easily use a linear layer instead.

CS231n explains this concept really well.

19 Likes