1D convolution for 1D feature input

dsp_torch · October 22, 2024, 11:23am

Hello Everyone,

I am using a time-series data for binary class classification. I have a training dataset of 4917 x 244 where 244 are the feature columns and 4917 are the onsets. For example if I am using a sliding window of 2048 then it calculates 1 x 244 feature vector for one window. Therefore we have such 4917 windows and its respective feature columns.
Now I am using a batch size to divide 4917 onsets into number of batches of size 256 each. Therefore we have feature vector 256 x 244 for one batch.

I want to implement 1D convolution to process this data and classify each window into 0 or 1 class. At my initial efforts I am very much confused in input_channel, output_channel and kernel size. I tried my best but eventually I am getting error of dimension mismatch or out of range during training.

I would appreciate if anyone could clearly explain the steps that I can follow to implement 1D CNN for my dataset.

ptrblck · October 22, 2024, 7:37pm

I don’t understand this approach completely. Are you padding the input and are you using the “feature” dimension as the channels? Are you then striding over the first dimension of the size 4917?

I’m also unsure how this operation is performed as it cannot be a reshape operation so could you also explain in more detail how the input is processed?

dsp_torch · October 23, 2024, 9:38am

Sorry may be I could not explain it better.

For example: I have a time series acoustic signal of an event of class 0 with a total number of samples let say 204800. Now I am using a window size of 2048 to slide over the total samples with zero overlap length. Then we have such 100 windows to scan all 204800 samples. Now for each window I am calculating 1D 244 features (Temporal, spectral, avg mffcs etc.). It means for each window I have a feature vector of 1 x 244. Now for a total of 100 windows I have 100 x 244 feature vectors. Now read the next .wav file let say for class 1 we have 20480 samples. Then the window size of 2048 with zero overlap length will return 10 x 244 feature vectors. When I append this feature vector to the previous one and now I have a feature size of 110 x 244. Repeating the same for all .wav files of class 0 and 1.
In the above query I said that I have such 4917 x 244 feature vectors that is evaluated on the window size of 2048 for all the .wav file of class 0 and 1. In other word I have training set of x_train = 4917 x 244 and y_train = 4917 x 1.
Now I am using the concept of batch size to divide the training set into number of batches and created a train_dataloader.
train_dl = DataLoader(CageData(X_train, y_train),
batch_size = 256,
shuffle = True,
num_workers = 2,
pin_memory = False)

Thus for 1 epoch it will iterate 19 times 4917/256 = ~19 to complete training of 4917 examples.
Similarly I have a test set of X_test = 626 x 244 and labels y_test = 626 x 1.
test_dl = DataLoader(CageData(X_test, y_test),
batch_size = 256,
shuffle = True,
num_workers = 2,
pin_memory = False)

Now I want to know how to process this data to my 1D CNN to train and test. What will be my channel_in , channel_out, kernel_size and what will be the dimension of fully connected layer.

ptrblck · October 23, 2024, 7:56pm

Thanks for the update.
I assume the preprocessing is already done and your X_train/test as well as y_train/test datasets are already created.

If I understand your question correctly you now want to pass this data (from the DataLoader) into a 1d-CNN.

nn.Conv1d layers expect a 3D input in the shape [batch_size, channels, seq_len] where channels corresponds to the number of input channels and seq_len to the size of the temporal dimension which will be used for the actual convolution.
Your current input has a shape of [batch_size, nb_features] and it’s unclear what exactly these features represent.
nn.Conv1d layers can be used for e.g. audio signals and the signal input could be a stereo audio data tensor in the shape [batch_size, nb_channels=2, seq_len=number_of_timesteps].
I thus don’t know if you want to treat the feature dimension in your input as a temporal dimension and thus move the conv filters through it (you would expect a temporal dependency between these data points in this case) or as channels.

Based on:

it seems you are computing a variety of features and since these were created using a sliding window I would expect they are correlated in the temporal dimension.
If so, you could unsqueeze(1) another additional dimension creating a single channel so that your input would have the shape [batch_size, nb_channels=1, seq_len=244].
In this case the nn.Conv1d layer would use in_channels=1 and an arbitrary number of out_channels. The same goes for the kernel_size, stride, etc. as you can define it yourself.
The number of in_features in the first linear layer used after flattening the intermediate activation would thus also depend on the setup used for all conv layers, which you should pick.

dsp_torch · October 24, 2024, 8:29am

Thank you for detailed explanation. I got the clarity and believe that it should work.