Conv1d internal weight calculations

Keith72 · August 9, 2020, 12:11am

Hi, I am trying to figure out how a nn.conv1d processes an input for a specific example related to audio processing in a WaveNet model. I have input data of shape (1,1,8820), which passes through an input layer (1,16,1), to output a shape of (1,16,8820). That part I understand, because you can just multiply the two matrices. The next layer is a conv1d, kernel size=3, input channels=16, output channels=16, so the state dict shows a matrix with shape (16,16,3) for the weights. When the input of (1,16,8820) goes through that layer, the result is another (1,16,8820). What multiplication steps occur within the layer to apply the weights to the audio data? In other words, if I wanted to apply the layer(forward calculations only) using only numpy for this example, how would I do that?

yoyololicon · August 9, 2020, 4:46am

Hi @Keith72, this is how pytorch conv1d actually do in your case:

x = torch.rand(1, 16, 8820)
weight = torch.rand(16, 16, 3)

# first pad zeros along the time dimension
x = torch.pad(x, [1, 1])    #shape = (1, 16, 8822)

#unfolded, so you have 8820 moving windows with size = (16, 3)
x = x.unfold(2, 3, 1)    #shape = (1, 16, 8820, 3)

# matrix multiplication, I use tensordot for simplicity
y = torch.tensordot(x,weight, dims=([1, 3], [1, 2]))     #shape = (1, 16, 8820)

In numpy you can simply replace pad and tensordot with corresponding numpy function; for unfold you can use numpy.lib.stride_tricks.as_strided.

Keith72 · August 9, 2020, 11:59am

Thanks for the quick response! My initial implementation seems to fit using your steps, except the last step gave me a shape (16,1,8820), so I just swapped the first two dimensions. Now if I wanted to account for layer dilation, how would that work?

yoyololicon · August 10, 2020, 3:21am

That can be achieved easily by using indexing:

x = torch.pad(x, [1 * dilation] * 2)
x = x.unfold(2, 2 * dilation + 1, 1)[..., ::dilation]    #shape = (1, 16, 8820, 3)
...

In numpy you can alternate the stride size of the ndarray to do dilated convolution. Here’s an example implemention, you can check it for details.

Keith72 · August 10, 2020, 11:51am

The example helps a lot, and thank you for taking the time to explain all that.