Understanding input shape to PyTorch conv1D?

someAdjectiveNoun · June 14, 2020, 1:12pm

This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.

I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.

So my input tensor to conv1D is [6, 512, 768].

input = torch.randn(6, 512, 768)

Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.

I assumed that “in_channels” are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where

in_channels = embedding dimension (768)

out_channels = 100 (arbitrary number)

kernel = 2

convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)

But with this assumption, I get the following error:

RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead

Then I assumed that “in_channels” is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where

in_channels = sequence length (512)

out_channels = 100 (arbitrary number)

kernel = 2

convolution_layer = nn.conv1D(512, 100, 2) 
feature_map = convolution_layer(input)

This works fine and I get an output feature map of dimension [batch_size, 100, 767]. However, I am confused. Shouldn’t the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]?

I will be really grateful for your help.

thomasjo · June 14, 2020, 1:45pm

I think the confusion here stems from the fact that PyTorch by default uses a NCHW memory format, with tensor dimensions structured accordingly. In other words, tensor dimensions are (batch, channel, height, width).

Therefore, in your example, if you change the input tensor to the assumed structure, everything should work as expected;

input = torch.randn(6, 768, 512)

Note that some work has been done, which shipped in v1.5, that allows you to change the memory format to one that is (arguably) more intuitive, namely the NHWC format. An additional benefit of the NHWC format is that it can lead to drastic performance improvements in some specific scenarios due to CuDNN optimizations.

Some references regarding the memory format:

someAdjectiveNoun · June 14, 2020, 2:19pm

Here is more context to it. https://stackoverflow.com/questions/62372938/understanding-input-shape-to-pytorch-conv1d