Conv1d or Conv2d for 1d vectors with batch size?

I know this is not new, however, after reading many explanations I am still really confused about the parameters which are required for Conv1D. Specifically, I have a dataset which contains 154k rows, and each rows is a 1D array of 289 floats. Now I want to train my model using batches with batch size = 50 (this is dynamic). In my model, there are some other type of layers such as Linears. The source code like this:

    self.linear1 = nn.Linear(noise_dim, sequenceLength)
    self.linear2 = nn.Linear(sequenceLength, 2*sequenceLength)
    self.linear3 = nn.Linear(2*sequenceLength, 2*sequenceLength)
    self.linear4 = nn.Conv1d(2*sequenceLength, sequenceLength,kernel_size=sequenceLength+1, stride=1, padding=0)

I expect to use Conv1D to reduce the size of input from 2*SequenceLength to a SequenceLength. However, it display the error of (289 is my sequenceLength in execution)

    Given groups=1, weight of size [289, 578, 290], expected input[1, 50, 578] to have 578 channels, but got 50 channels instead

It took my batch size into account, if so, in the training section, if I need to change the batch size, then we need to upgrade our models?
Moreover, if taking the batch size together with the input, I think it should be Conv2d, am I right?

Thank you

nn.Conv1d expects a batched 3-dimensional input in the shape [batch_size, in_channels, seq_length] or an unbatched 2-dimensional input in the shape [in_channels, seq_length]. Based on the error message you are using the second approach and the layer unsqueezes the batch dimension for you. Note that conv layers use a kernel and apply a convolution (or rather cross-correlation) on the input. To simply reduce the dimension of the input you could still apply a linear layer. In case your activation has a temporal dimension and a kernel should be applied over it you might want to unsqueeze the channel dimension manually.

The in_channels argument should be a single int value not a list.

I think I got your idea, thank you very much

Hi @ptrblck
This is my new update

def __init__(self, noise_dim, sequenceLength):
    self.linear1 = nn.Linear(noise_dim, sequenceLength)
    self.linear2 = nn.Linear(sequenceLength, 2*sequenceLength)
    self.linear3 = nn.Linear(2*sequenceLength, 2*sequenceLength)
    self.conv4 = nn.Conv1d(2*sequenceLength, sequenceLength,kernel_size=3, padding=0)
    self.lrelu = nn.LeakyReLU(0.01)
    self.tanh = nn.Tanh()
    self.sigmoid = nn.Sigmoid()

def forward(self, x):
    out = self.linear1(x)
    out = self.lrelu(out)
    out = self.linear2(out)
    out = self.tanh(out)
    out = self.linear3(out)
    out = self.lrelu(out)
    out = out.unsqueeze(2)
    out = self.conv4(out)
    out = out.squeeze()
    out = self.sigmoid(out)
    return out

I add unsqueeze(2) in the forward functions and it can work fine except for the kernel size.
If I increase the kernel size to any number greater than 1, I will receive the error:

Calculated padded input size per channel: (1). Kernel size: (3). Kernel size can't be greater than actual input size

I wonder my sequence length is 289 (in execution), with 1 dimension, why the kernel cannot be some thing like 3 or 4,… so on.

Because you are unsqueezing the temporal dimension and use the sequence length as the channels. Check the expected input shape described in my previous post.

@ptrblck , I change the padding to 1, and it work (with kernel = 3).
But according to your answer, if the sequence turn into the channels, I wonder the if the kernel keep working correctly or not? Or it only return without stride?
Thank you

Using a padded input would avoid running into functionality issues, but note that only the “center” pixel of the conv kernel will be used as seen here:

conv = nn.Conv1d(10, 1, kernel_size=3, padding=1, bias=False)
x = torch.ones(1, 10, 1)

out = conv(x)
# tensor([[[0.7653]]], grad_fn=<ConvolutionBackward0>)
print(conv.weight[:, :, 1].sum())
# tensor(0.7653, grad_fn=<SumBackward0>)