Problem about conv1d

Zhang_Wen · February 28, 2018, 9:19am

F.conv1d(autograd.Variable(torch.randn(20, 1, 4 *200 )), autograd.Variable(torch.randn(256, 1, 2 * 200)), stride=200, padding=0, dilation=2).size()

i think this should get tensor with size (20, 256, 2)

but i got tensor with size (20, 256, 1), what is the problem ?

ptrblck · February 28, 2018, 10:10am

Let’s calculate the output size manually.
Your input length is 800 and your kernel_size is 400.
Using a stride of 200 this would result in 3 output values.
The output values would be calculated from the following input windows: input[0:400], input[200:600], input[400:800].
Since you are using a dilation of 2, your kernel_size is spread to a size of 800.
Therefore, you only get one output, since the input size is now equal to your “active” kernel_size, i.e. both are 800.

Have a look at @ezyang convolution visualizer and use your setiings to get a feeling about the operation.
@ezyang Setting input size or stride to 0 lets the page disappear. Awesome tool btw!

Zhang_Wen · February 28, 2018, 10:37am

@ptrblck thank you for replying.

however, if i use
F.conv1d(autograd.Variable(torch.randn(20, 200, 4)), autograd.Variable(torch.randn(256, 200, 2)), stride=1, padding=0, dilation=2).size()

i got tensor with size (20, 256, 2) , why ? what is the difference of both ?

ptrblck · February 28, 2018, 10:45am

The difference between both code snippets is the dimension of channels vs. length.
For Conv1d the input dimensions are defined as [batch, channel, length].

In this example you are creating an input with 200 channels and a length of 4.
Since your stride is set to 1, you will get an output dimension of 2.
Try to visualize the operation using the link above.

What exactly is your use case? Could you explain a bit what you would like to achieve?

Zhang_Wen · February 28, 2018, 11:19am

thank you @ptrblck
i found the link above only can show the results of Cov2d (H * W) , and my case is Cov1d.

of course, here is my use case:

i have a representation with size (20, 1, 4 * 200) (batch_size, in_channels, in_length), and i want to convolute it with the out_channels 256, kernel size 2, stride 200, and dilation should be 2 * 200 (i know the error happens here), the output tensor’s size should be (20, 256, 2) (batch_size, out_channels, out_length)

if change the dilation to be 2 * 200
F.conv1d((20, 1, 4*200), (256, 1, 2)), stride=200, padding=0, dilation=2 * 200)
if got the tensor with size (20, 256, 2), i think this should be same with

F.conv1d((20, 200, 4), (256, 200, 2), stride=1, padding=0, dilation=2)

right ?

ptrblck · February 28, 2018, 12:27pm

You could try to look at the 2d example using the link and just see, how the convolution is performed in the first row.

To your use case:
You will get the same output shape, but the operation is different. since each kernel multiplies its receptive field with all input channels.

In your first example each kernel will multiply its weights with only 2 input values at two different indices, since there is only 1 input channel.

In your second example each kernel will also multiply its weights at two indices, but with all 200 input channels.
That’s why, the weight matrix is much bigger in this case.

Even though the output size is the same, the operation is quite different.

Zhang_Wen · March 5, 2018, 3:05pm

@ptrblck sorry for the late reply

i actually use the CNN for the natural language processing task, i am not sure which one i should use to encode the sentence, (Yoon Kim 2014) convolute one sentence by setting the in_channel to be 1 , and i guess cnn for image-related task will have multi in_channel, right ?

ptrblck · March 5, 2018, 3:10pm

I’m not familiar with this paper, but if they use in_channels=1, you should use your first approach (input dim = [20, 1, 4*200]).

In a lot of cases, you will feed color images into your ConvNet, so that the input will have a dimension of [batch, 3, w, h]. In the next layers the channel dimension is defined by e.g. the number of kernels in your ConvLayers.
However, gray-scale images will also have one single channel, i.e. [batch, 1 ,w, h].

Zhang_Wen · March 10, 2018, 9:22am

hi, ptrblck, you are very nice, and you are right, cnn is very popular in image processing task, and is not very suitable for nlp task, i think, but, anyway, thank you very much, you helped me to solve my issue and understand cnn more.

ptrblck · March 10, 2018, 10:53am

Well, I wouldn’t say CNNs are not suitable for NLP tasks. I think it’s not the common choice though.
However, this paper was released last week and claims that CNN architectures are indeed better at sequence modelling than their recurrent counterparts like LSTM.
I haven’t finished reading this paper, but you should have a look at it and keep on with your approach!

Zhang_Wen · March 10, 2018, 12:25pm

oh, it’s great! i think this paper should be very helpful for me, thank you very much! i will read it later.

Zhang_Wen · March 10, 2018, 3:13pm

hi, ptrblck, i have another question about CNNs, do you have some experiences about what is the best choice to optimize the parameters in CNNs, SGD Adadelta or Adam, i have tried them, but the results were not very good . Can you give me some suggestion ? thank you.

ptrblck · March 10, 2018, 8:10pm

I usually start with Adam and try out some learning rates between 1e-2 to 1e-4. If you can overfit your training data you should concentrate of regularization.