I have input of dimension 32 x 100 x 1 where 32 is the batch size.

I wanted to convolved over 100 x 1 array in the input for each of the 32 such arrays i.e. a single data point in the batch has an array like that.

I hoped that conv1d(100, 100, 1) layer will work.

How does this convolves over the array ? How many filters are created? Does this convolve over 100 x 1 dimensional array? or is a filter created for each of the 100 dimensions?

There are so many terminologies I found searching for this function Filter, Length, Channel_in Channel out.

Any help in explaining what the above layer does to the above output will be of great help.

100 filters are created and it does convolve over a 100x1 dimensional array. A filter is created for each of the 100 dimensions. Number of filters = Number of out_channels.

if each of the 100 filters convolves over 100x1 dimensional array, then there should be 100x 100 outputs right? what will be the output size? and what is the filter size?

thanks, i am not getting hwo the convolutions are done: if each of the 100 filters convolves over 100x1 dimensional array, then there should be 100x 100 outputs right?

Well, not really. Currently you are using a signal of shape [32, 100, 1], which corresponds to [batch_size, in_channels, len].
Each kernel in your conv layer creates an output channel, as @krishnavishalv explained, and convolves the “temporal dimension”, i.e. the len dimension.
Since len is in your case set to 1, there won’t be much to convolve, as you basically passed a single time stamp with 100 channels.
Try to think about your signal as a sound source. In a simple use case you would have 2 channels (left and right) and a certain length, e.g. 1000 time stamps. Your input would thus have the shape [batch_size, 2, 1000].
Now if you setup a conv layer, you would have to use in_channels=2 and an arbitrary number of out_channels. Remember, the out_channels just define the number of kernels. Each kernel is applied separately on the input.
The kernel size defines, how much of the temporal dimension is used in a sliding window fashion.
E.g. if you set kernel_size=5, 5 time stamps will be used for the convolution for each position.

In your use case, however, we only have one single time stamp, so that you could easily use a linear layer instead.

I read your detailed reply for this example but I am still “lost in dimentions”. I would like to try my luck and ask you to help me out.

My data is about patients, it has a shape of 239 (number of patients), 49 (rows per each patient that is time stamps), 5 (features). The output is a binary label - one per patient, thus the shape is 239.

I assume for my case I should use Conv1d. I am then lost with what is what… as per your reply I start thinking that number of in-channels is the number of features in my case and the length is 49 (rows). If that is right does it mean I have to somehow transpose the tensor so that my shape is 239,5,49? I am getting lost here.

You are right. Convolution operation works on spatial/temporal data (in our examples) and you can think of your data in this way, that you have 5 features for each time stamp, not 5 time staps for each feature.

A better way is to assume that your input data is the output of another operation, in this case, 5 features corresponds to 5 different kernels that recorded features for all timestamps. When you are defining conv layer, you need to specify output channel which eactly could have the same definition as 5 features.

I do not know you are familiar with image or not, but for a 2D image (1D can be true to but does not make much sense), then shape of input would be [number of images, channel, height, width] which for 1 RGB image would be [1, 3, 10, 10]. 3 corresponds to Red, Green and Blue, same as your 5 features. Then 10x10 is the spatial domain, which in your case, it is a 1D temporal domain. So, it could be like [1, 3, 10].

You can transpose the dimensions using tensor.permute().

First of all, thank you for your attention to my question and your time to answer me.
I will now describe my understanding, please correct me if I am still not getting it:

I need to first transpose my current tensor (239 groups each containing 49 rows/time stamps having 5 columns/features)

into a tensor of

5 rows (features) having 49 * 239 = 11 711 columns/time stamps (that represent 239 groups each having 49 columns)?

Then, as I wish to use 1 patient per batch (239 patients), for the batching I should slice the tensor 238 times, meaning that each batch dimension will be 5 (rows) x 49 (columns) and I will have 239 of these batches.

In this case my understanding is that the Conv1d would be:

torch.nn. Conv1d( in_channels: 5 (features/rows), out_channels: 5, kernel_size: I can try different numbers here, the kernel will be sliding over 49 time stamps in each out of 239 batches, stride: I can change, it is a kernel “step”, padding: depending on the kernel size might use padding, dilation: another hyperparameter I might tune, groups: not applicable for me, bias: bool = True, padding_mode: str = ‘zeros’* )

Thank you in advance for any comments and your reply, if any.

x = torch.ones((239, 49, 5))
x = x.permute((0, 2, 1))
print(x.shape)
model = nn.Conv1d(5, 5, 3)
output = model(x)
print(output.shape)

If you want to know a little more what happens in each channel, you can play with above example like this:

x = torch.ones((239, 49, 5))
x = x.permute((0, 2, 1)) # permuate feature and temporal channels
print(x.shape)
model = nn.Conv1d(5, 5, 49, groups=5, bias=False) # remove bias and set window same as whole sequence of temporal data
nn.init.constant_(model.weight, 1.) # set kernel to one (instead of random)
output = model(x)
print(output.shape)