How to apply 1d convolution on data having batch size 1?

hi, I want to apply 1d convolution on this input data of (128, 64). But, I am finding an error, indeed I provided the input of shape (1, 128, 64) to CNN model. Can you help me to fix?

import torch
import torch.nn as nn

Define our input data (128 samples, each with a sequence of 64 features)

input_data = torch.randn(128, 64)
input_data1 = input_data.unsqueeze(0)
print(input_data.shape)

Define the 1D CNN model

conv1d = nn.Conv1d(in_channels=64, out_channels=32, kernel_size=3)

output = conv1d(input_data1)
print(output.shape)
print(output.unsqueeze(0).shape)

RuntimeError Traceback (most recent call last)
in <cell line: 13>()
11
12
—> 13 output = conv1d(input_data)
14 # Apply the convolution to the input data
15 #output = conv1d(input_data.unsqueeze(0)) # Add a batch dimension with .unsqueeze(0)

3 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
304 weight, bias, self.stride,
305 _single(0), self.dilation, self.groups)
→ 306 return F.conv1d(input, weight, bias, self.stride,
307 self.padding, self.dilation, self.groups)
308

RuntimeError: Given groups=1, weight of size [32, 64, 3], expected input[1, 128, 64] to have 64 channels, but got 128 channels instead

nn.Conv1d expects either a batched input in the shape [batch_size, channels, seq_len] or an unbatched input in the shape [channels, seq_len].
In your example you are using the first approach by explicitly unsqueezing the batch dimension and the 128 samples will be interpreted as the channel dimension.
I’m unsure if you want to treat the input as a single sample containing 128 time steps, each with a channel dimension of 64, but if so this should work:

input1 = input1.permute(0, 2, 1).contiguous()
# input1 should now have the shape [1, 64, 128]
1 Like

@ptrblck , I have unbatched input. In my case I have total 128 samples and every single sample has a dimension of 64. By providing the unbatched input, I have got the output of Con1D of shape (32, 126) .

Commonly you would refer to the number of elements in the batch dimension as samples, so I assume your “samples” refer to a temporal dimension instead.

I assume you got this shape using my approach and this shape matches your expectation?

@ptrblck , yes this shape matches my expectation.

@ptrblck , if we have an input sample of shape (128, 64), for every single sample I want to predict its class. Do you think after applying the convolution 1d layer pooling layer is helpful? Pooling layer reduce the sample size, I want to predict exact same 128 classes.

After permuting the input as seen in my previous post the shape would be [batch_size=1, channels=64, seq_len=128], so the batch contains a single sample with 64 channels and 128 time steps.
The shape you’ve posted is again wrong as you are treating the 128 time steps as “samples”, while you’ve previously defined a batch size of 1.
If you keep the shape as shown in my post as [1, 64, 128] you will be able to classify each time step with a target in the shape [1, 128] containing values in the range [0, 64] (assuming 64 represents the class dimension).

@ptrblck , I am treating it as an unbatched element. My data is not in batched. The time when I post the question, I was thinking CNN doesn’t accept unbatched data.

It doesn’t matter since the model will unsqueeze dim0 internally if you pass an unbatched input.
The shapes are still as explained before.

In Conv1d, a pooling layer will reduce the size of dim = -1. This typically has the effect of filtering out important vs. unimportant data from the sequences.

@Noorulain_Islam reading through the thread, it seems there may be a difference in the terms you are using and what are typically used, and I want to make sure I’m on the same page as you. Can you provide the physical description of what 128 and 64 represent in your input data size?

@J_Johnson , in my case, 128 are the total number of training examples (total training instances), 64 is the feature dimension for every single training example. When I pass this data to con1d I reshape it with shape (64, 128) and then pass this unbatched data to Conv1D layer to extract features. Conv1D provide me output of shape (32, 126). However, if I use padding same then then I get output (32, 128).

Noted that the 0 dim represents 128 samples and the 1 dim represents 64 features.

My next question regards the features dim only.

Are these

  1. Temporally/spatially related to one another, i.e. sequential; OR
  2. Are they not sequentially related to one another?

Examples of sequentially related data would be stock close price over time, notes in a song, glucose levels over time, word tokens in a sentence, etc. In other words, the order is as important as the values.

Non-sequentially related data would be like (height, weight, hair color) of a person, (temperature, radius, luminosity, mass) of a star, etc. In other words, order of the values is not important.

@J_Johnson , my features are not sequentially related to one another.

Based on them not being sequentially related, I would suggest you use nn.Linear. The Linear layer is best suited for features that do not contain a spatial or temporal relation to one another, i.e. are non-sequential.

Thankyou for your suggestion.