Understanding Convolution 1D output and Input

Rajesh_Karmakar · November 28, 2018, 11:36am

Hi,

I have input of dimension 32 x 100 x 1 where 32 is the batch size.

I wanted to convolved over 100 x 1 array in the input for each of the 32 such arrays i.e. a single data point in the batch has an array like that.

I hoped that conv1d(100, 100, 1) layer will work.

How does this convolves over the array ? How many filters are created? Does this convolve over 100 x 1 dimensional array? or is a filter created for each of the 100 dimensions?

There are so many terminologies I found searching for this function Filter, Length, Channel_in Channel out.

Any help in explaining what the above layer does to the above output will be of great help.

Cheers!

krishnavishalv · November 28, 2018, 11:55am

In your example of conv1d(100, 100, 1).

in_channels = 100
out_channels = 100
kernel_size = 1
By default stride = 1.

100 filters are created and it does convolve over a 100x1 dimensional array. A filter is created for each of the 100 dimensions. Number of filters = Number of out_channels.

Rajesh_Karmakar · November 28, 2018, 12:02pm

if each of the 100 filters convolves over 100x1 dimensional array, then there should be 100x 100 outputs right? what will be the output size? and what is the filter size?

krishnavishalv · November 28, 2018, 12:11pm

execute the following code.

import torch
from torch import nn

a = torch.randn(32, 100, 1)  
m = nn.Conv1d(100, 100, 1) 
out = m(a)
print(out.size())
print(m)

See the link below for more clear explanation.

https://pytorch.org/docs/stable/nn.html#torch.nn.Conv1d

Rajesh_Karmakar · November 28, 2018, 12:14pm

thanks, i am not getting hwo the convolutions are done: if each of the 100 filters convolves over 100x1 dimensional array, then there should be 100x 100 outputs right?

ptrblck · November 28, 2018, 12:49pm

Well, not really. Currently you are using a signal of shape [32, 100, 1], which corresponds to [batch_size, in_channels, len].
Each kernel in your conv layer creates an output channel, as @krishnavishalv explained, and convolves the “temporal dimension”, i.e. the len dimension.
Since len is in your case set to 1, there won’t be much to convolve, as you basically passed a single time stamp with 100 channels.
Try to think about your signal as a sound source. In a simple use case you would have 2 channels (left and right) and a certain length, e.g. 1000 time stamps. Your input would thus have the shape [batch_size, 2, 1000].
Now if you setup a conv layer, you would have to use in_channels=2 and an arbitrary number of out_channels. Remember, the out_channels just define the number of kernels. Each kernel is applied separately on the input.
The kernel size defines, how much of the temporal dimension is used in a sliding window fashion.
E.g. if you set kernel_size=5, 5 time stamps will be used for the convolution for each position.

In your use case, however, we only have one single time stamp, so that you could easily use a linear layer instead.

CS231n explains this concept really well.

smu226 · February 17, 2019, 5:42am

I tried this, but I am getting this error TypeError: argument 0 is not a Variable Do you know what can be the cause? Thank you!

InnovArul · February 17, 2019, 5:47am

I guess you are using pytorch <= 0.4.

Try to wrap the tensors with autograd.Variable.

from torch.autograd import Variable
input = Variable(input)

shlomiamitai · July 3, 2019, 2:16pm

following their documantation, the arithmetic is accumulating the convolution results of all channels, which is weird…

caglar_demir · August 27, 2020, 1:28pm

The following chunk of code might complement the answer of @ptrblck .

import torch
from torch import nn
import numpy as np
np.random.seed(1)

def convolve_slice(X,W,b,stride=2):
    xH,xW=X.shape
    wH,wW=W.shape
    vH,vW= (xH-wH)//stride +1,(xW-wW)//stride +1    
    assert vH==vW
    assert wH==wW
    k=wH
    v=np.zeros((vH,vW)) # output shape
    for i in range(vH):
        for j in range(vH):
            v[i,j]=np.sum(X[stride*i:(stride*i)+k,stride*j:(stride*j)+k] *W) +b    
    return v


xH,xW=7,7
stride=2
x=np.random.randn(xH,xW)

# show it if you will.
#plt.imshow(x, interpolation='none')
#plt.show()


k = 3 # kernel/filter shape.
conv = nn.Conv1d(in_channels=1,
                 kernel_size=(k, k),
                 out_channels=1, stride=stride)
# from numpy to torch
x_torch = torch.from_numpy(x)
x_torch = x_torch.float()
x_torch = x_torch.view(1, 1, xH, xW)
print(conv(x_torch))

w0 = conv.weight.data.numpy()
b0 = conv.bias.data.numpy()
w0 = w0.reshape(k, k)

m=convolve_slice(x,w0,b0,2)
print(m)

Alice_NL · October 5, 2020, 10:44am

Dear Piotr,

I read your detailed reply for this example but I am still “lost in dimentions”. I would like to try my luck and ask you to help me out.

My data is about patients, it has a shape of 239 (number of patients), 49 (rows per each patient that is time stamps), 5 (features). The output is a binary label - one per patient, thus the shape is 239.

I assume for my case I should use Conv1d. I am then lost with what is what… as per your reply I start thinking that number of in-channels is the number of features in my case and the length is 49 (rows). If that is right does it mean I have to somehow transpose the tensor so that my shape is 239,5,49? I am getting lost here.

Thank you in advance for any comments…

Best regards,
Alice

Nikronic · October 5, 2020, 7:47pm

Hi,

You are right. Convolution operation works on spatial/temporal data (in our examples) and you can think of your data in this way, that you have 5 features for each time stamp, not 5 time staps for each feature.

A better way is to assume that your input data is the output of another operation, in this case, 5 features corresponds to 5 different kernels that recorded features for all timestamps. When you are defining conv layer, you need to specify output channel which eactly could have the same definition as 5 features.

I do not know you are familiar with image or not, but for a 2D image (1D can be true to but does not make much sense), then shape of input would be [number of images, channel, height, width] which for 1 RGB image would be [1, 3, 10, 10]. 3 corresponds to Red, Green and Blue, same as your 5 features. Then 10x10 is the spatial domain, which in your case, it is a 1D temporal domain. So, it could be like [1, 3, 10].

You can transpose the dimensions using tensor.permute().

Bests

Alice_NL · October 6, 2020, 10:37am

Dear Doosti and all,

First of all, thank you for your attention to my question and your time to answer me.
I will now describe my understanding, please correct me if I am still not getting it:

I need to first transpose my current tensor (239 groups each containing 49 rows/time stamps having 5 columns/features)

into a tensor of

5 rows (features) having 49 * 239 = 11 711 columns/time stamps (that represent 239 groups each having 49 columns)?

Then, as I wish to use 1 patient per batch (239 patients), for the batching I should slice the tensor 238 times, meaning that each batch dimension will be 5 (rows) x 49 (columns) and I will have 239 of these batches.

In this case my understanding is that the Conv1d would be:

torch.nn. Conv1d( in_channels: 5 (features/rows), out_channels: 5, kernel_size: I can try different numbers here, the kernel will be sliding over 49 time stamps in each out of 239 batches, stride: I can change, it is a kernel “step”, padding: depending on the kernel size might use padding, dilation: another hyperparameter I might tune, groups: not applicable for me, bias: bool = True, padding_mode: str = ‘zeros’* )

Thank you in advance for any comments and your reply, if any.

Sincerely,
Alice

Nikronic · October 6, 2020, 8:39pm

Yes, you are absolutely correct.

Here is a snippet for your case:

x = torch.ones((239, 49, 5))
x = x.permute((0, 2, 1))
print(x.shape)
model = nn.Conv1d(5, 5, 3)
output = model(x)
print(output.shape)

If you want to know a little more what happens in each channel, you can play with above example like this:

x = torch.ones((239, 49, 5))
x = x.permute((0, 2, 1))  # permuate feature and temporal channels
print(x.shape)
model = nn.Conv1d(5, 5, 49, groups=5, bias=False)  # remove bias and set window same as whole sequence of temporal data
nn.init.constant_(model.weight, 1.)  # set kernel to one (instead of random)
output = model(x)
print(output.shape)