Understanding Convolution 1D output and Input


I have input of dimension 32 x 100 x 1 where 32 is the batch size.

I wanted to convolved over 100 x 1 array in the input for each of the 32 such arrays i.e. a single data point in the batch has an array like that.

I hoped that conv1d(100, 100, 1) layer will work.

How does this convolves over the array ? How many filters are created? Does this convolve over 100 x 1 dimensional array? or is a filter created for each of the 100 dimensions?

There are so many terminologies I found searching for this function Filter, Length, Channel_in Channel out.

Any help in explaining what the above layer does to the above output will be of great help.



In your example of conv1d(100, 100, 1).

in_channels = 100
out_channels = 100
kernel_size = 1
By default stride = 1.

100 filters are created and it does convolve over a 100x1 dimensional array. A filter is created for each of the 100 dimensions. Number of filters = Number of out_channels.


if each of the 100 filters convolves over 100x1 dimensional array, then there should be 100x 100 outputs right? what will be the output size? and what is the filter size?

1 Like

execute the following code.

import torch
from torch import nn

a = torch.randn(32, 100, 1)  
m = nn.Conv1d(100, 100, 1) 
out = m(a)

See the link below for more clear explanation.



thanks, i am not getting hwo the convolutions are done: if each of the 100 filters convolves over 100x1 dimensional array, then there should be 100x 100 outputs right?

1 Like

Well, not really. Currently you are using a signal of shape [32, 100, 1], which corresponds to [batch_size, in_channels, len].
Each kernel in your conv layer creates an output channel, as @krishnavishalv explained, and convolves the “temporal dimension”, i.e. the len dimension.
Since len is in your case set to 1, there won’t be much to convolve, as you basically passed a single time stamp with 100 channels.
Try to think about your signal as a sound source. In a simple use case you would have 2 channels (left and right) and a certain length, e.g. 1000 time stamps. Your input would thus have the shape [batch_size, 2, 1000].
Now if you setup a conv layer, you would have to use in_channels=2 and an arbitrary number of out_channels. Remember, the out_channels just define the number of kernels. Each kernel is applied separately on the input.
The kernel size defines, how much of the temporal dimension is used in a sliding window fashion.
E.g. if you set kernel_size=5, 5 time stamps will be used for the convolution for each position.

In your use case, however, we only have one single time stamp, so that you could easily use a linear layer instead.

CS231n explains this concept really well.


I tried this, but I am getting this error TypeError: argument 0 is not a Variable Do you know what can be the cause? Thank you!

I guess you are using pytorch <= 0.4.

Try to wrap the tensors with autograd.Variable.

from torch.autograd import Variable
input = Variable(input)

following their documantation, the arithmetic is accumulating the convolution results of all channels, which is weird…

The following chunk of code might complement the answer of @ptrblck .

import torch
from torch import nn
import numpy as np

def convolve_slice(X,W,b,stride=2):
    vH,vW= (xH-wH)//stride +1,(xW-wW)//stride +1    
    assert vH==vW
    assert wH==wW
    v=np.zeros((vH,vW)) # output shape
    for i in range(vH):
        for j in range(vH):
            v[i,j]=np.sum(X[stride*i:(stride*i)+k,stride*j:(stride*j)+k] *W) +b    
    return v


# show it if you will.
#plt.imshow(x, interpolation='none')

k = 3 # kernel/filter shape.
conv = nn.Conv1d(in_channels=1,
                 kernel_size=(k, k),
                 out_channels=1, stride=stride)
# from numpy to torch
x_torch = torch.from_numpy(x)
x_torch = x_torch.float()
x_torch = x_torch.view(1, 1, xH, xW)

w0 = conv.weight.data.numpy()
b0 = conv.bias.data.numpy()
w0 = w0.reshape(k, k)


Dear Piotr,

I read your detailed reply for this example but I am still “lost in dimentions”. I would like to try my luck and ask you to help me out.

My data is about patients, it has a shape of 239 (number of patients), 49 (rows per each patient that is time stamps), 5 (features). The output is a binary label - one per patient, thus the shape is 239.

I assume for my case I should use Conv1d. I am then lost with what is what… as per your reply I start thinking that number of in-channels is the number of features in my case and the length is 49 (rows). If that is right does it mean I have to somehow transpose the tensor so that my shape is 239,5,49? I am getting lost here.

Thank you in advance for any comments…

Best regards,


You are right. Convolution operation works on spatial/temporal data (in our examples) and you can think of your data in this way, that you have 5 features for each time stamp, not 5 time staps for each feature.

A better way is to assume that your input data is the output of another operation, in this case, 5 features corresponds to 5 different kernels that recorded features for all timestamps. When you are defining conv layer, you need to specify output channel which eactly could have the same definition as 5 features.

I do not know you are familiar with image or not, but for a 2D image (1D can be true to but does not make much sense), then shape of input would be [number of images, channel, height, width] which for 1 RGB image would be [1, 3, 10, 10]. 3 corresponds to Red, Green and Blue, same as your 5 features. Then 10x10 is the spatial domain, which in your case, it is a 1D temporal domain. So, it could be like [1, 3, 10].

You can transpose the dimensions using tensor.permute().



Dear Doosti and all,

First of all, thank you for your attention to my question and your time to answer me.
I will now describe my understanding, please correct me if I am still not getting it:

I need to first transpose my current tensor (239 groups each containing 49 rows/time stamps having 5 columns/features)

into a tensor of

5 rows (features) having 49 * 239 = 11 711 columns/time stamps (that represent 239 groups each having 49 columns)?

Then, as I wish to use 1 patient per batch (239 patients), for the batching I should slice the tensor 238 times, meaning that each batch dimension will be 5 (rows) x 49 (columns) and I will have 239 of these batches.

In this case my understanding is that the Conv1d would be:

torch.nn. Conv1d( in_channels: 5 (features/rows), out_channels: 5, kernel_size: I can try different numbers here, the kernel will be sliding over 49 time stamps in each out of 239 batches, stride: I can change, it is a kernel “step”, padding: depending on the kernel size might use padding, dilation: another hyperparameter I might tune, groups: not applicable for me, bias: bool = True, padding_mode: str = ‘zeros’* )

Thank you in advance for any comments and your reply, if any.



Yes, you are absolutely correct.

Here is a snippet for your case:

x = torch.ones((239, 49, 5))
x = x.permute((0, 2, 1))
model = nn.Conv1d(5, 5, 3)
output = model(x)

If you want to know a little more what happens in each channel, you can play with above example like this:

x = torch.ones((239, 49, 5))
x = x.permute((0, 2, 1))  # permuate feature and temporal channels
model = nn.Conv1d(5, 5, 49, groups=5, bias=False)  # remove bias and set window same as whole sequence of temporal data
nn.init.constant_(model.weight, 1.)  # set kernel to one (instead of random)
output = model(x)