How to give input to this model?

Hi there,
I am trying to implement 1 DCNN over raw audio data (mono channel).
I have say 60ksamples in all of wav files. Now my model code goes like

self.conv_1 = nn.Conv1d(self.input_spec_size,self.cnn_filter_size,3,1)
self.max_pooling_1 = nn.MaxPool1d(3)

But it throws the error: Expected 3-dimensional input for 3-dimensional weight [64, 60000, 3], but got 2-dimensional input of size [40, 60000] instead

Here 64 is no of output filters, 3 is kernel size and 40 is my batch size. Plz someone guide how to reshape this (60000,1 ) input data.
#########################
I tried reshaping input as

data.reshape(1,60000)
then the error :

Given groups=1, weight of size [64, 60000, 3], expected input[40, 1, 60000] to have 60000 channels, but got 1 channels instead

For starters, it looks like your in_channels argument is taking the value 60000. I believe that needs to be 1.

Conv1d expects inputs in the shape (batch_size, n_channels, Seq Length), so your data must be reshaped as (40, 1, 60000)

CN = torch.nn.Conv1d(in_channels=1, out_channels=64, kernel_size=3)
trial = torch.randn((40, 60000))
out = CN(trial.reshape(40, 1, 60000)) 

# out is of shape [40, 64, 59998] , 59998 being the expected number of samples after convolution
1 Like

Thanks a lot, I have corrected already but another problem rose. I hope you can help me with this, I am very new to understand pytorch lstm and encoderlayer.
in my model 1 D CNN is followed by lstm layers, But at this encoder layer I am getting syntax error

self.encoder_layer=nn.TransformerEncoderLayer(d_model=self.hidden_size_lstm*2,dim_feedforward=512,nhead=self.num_heads_self_attn)

I am not getting whats wrong, referred doc of pytorch too.

Can you share the full code and error message here?

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.nn import functional as F

class CNN_1D(torch.nn.Module):
def init(self, num_layers_lstm, num_heads_self_attn,
hidden_size_lstm, num_emo_classes):
super(CNN_1D, self).init()

    self.num_layers_lstm=num_layers_lstm
    self.num_heads_self_attn=num_heads_self_attn
    self.hidden_size_lstm=hidden_size_lstm
    self.num_emo_classes=num_emo_classes
    #self.num_gender_class=num_gender_class
    
    
    self.layer1 = nn.Sequential(
        nn.Conv1d(self.input_spec_size,64,3,1),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2))
    self.layer2 = nn.Sequential(
        nn.Conv1d(64,100,5,1),
        nn.BatchNorm2d(100),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2))
    self.layer3 = nn.Sequential(
        nn.Conv1d(100,100,7,1),
        nn.BatchNorm2d(100),
        nn.ReLU(),


    ###
    self.lstm = nn.LSTM(input_size=100, hidden_size=self.hidden_size_lstm,num_layers=self.num_layers_lstm,bidirectional=True,dropout=0.5,batch_first=True)
    ## Transformer
    self.encoder_layer = nn.TransformerEncoderLayer(d_model=self.hidden_size_lstm*2,dim_feedforward=512,nhead=self.num_heads_self_attn)
    #self.gender_layer  = nn.Linear(self.hidden_size_lstm*4,self.num_gender_class)
    self.emotion_layer = nn.Linear(6664,self.num_emo_classes)

self.encoder_layer = nn.TransformerEncoderLayer(d_model=self.hidden_size_lstm*2,dim_feedforward=512,nhead=self.num_heads_self_attn)
^
SyntaxError: invalid syntax

In your code, some of the parameters are missing. I modified your code slightly and now it should be possible to initialize the class CNN_1D. Make changes wherever required.

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.nn import functional as F

class CNN_1D(torch.nn.Module):
    def __init__(self, num_layers_lstm, num_heads_self_attn, hidden_size_lstm, num_emo_classes, input_spec_size):
        super(CNN_1D, self).__init__()
        self.num_layers_lstm = num_layers_lstm
        self.num_heads_self_attn = num_heads_self_attn
        self.hidden_size_lstm = hidden_size_lstm
        self.num_emo_classes = num_emo_classes
        self.input_spec_size = input_spec_size
        # self.num_gender_class=num_gender_class

        self.layer1 = nn.Sequential(
            nn.Conv1d(self.input_spec_size, 64, 3, 1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2))

        self.layer2 = nn.Sequential(
            nn.Conv1d(64, 100, 5, 1),
            nn.BatchNorm2d(100),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2))

        self.layer3 = nn.Sequential(
            nn.Conv1d(100, 100, 7, 1),
            nn.BatchNorm2d(100),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2))
        ###
        self.lstm = nn.LSTM(input_size=100, hidden_size=self.hidden_size_lstm, num_layers=self.num_layers_lstm,
                            bidirectional=True, dropout=0.5, batch_first=True)
        ## Transformer
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=self.hidden_size_lstm * 2, dim_feedforward=512,
                                                        nhead=self.num_heads_self_attn)
        # self.gender_layer  = nn.Linear(self.hidden_size_lstm*4,self.num_gender_class)
        self.emotion_layer = nn.Linear(6664, self.num_emo_classes)
2 Likes

Still I am getting this error,

Given groups=1, weight of size [64, 7, 3], expected input[1, 1, 60000] to have 7 channels, but got 1 channels instead

this 7 channel, I am not getting. Then I am trying to calculate output at each layer

import torch
import torch.nn as nn
I=torch.randn(40,1,60000) #### 40 is batch size,60000 is sample length

layer1 = nn.Sequential(
nn.Conv1d(1,64,3,1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2))

O1=layer1(I)

It says: expected 4D input (got 3D input)

You’ll need to use BatchNorm1d(64).

Edit: Also seen that you are using MaxPool2d. That needs to be changed to 1D as well. Using 2D will also affect the number of filters.

1 Like

Thank you so much I am fool.

are you able to figure out the first error here. in my previous post?

Actually let me try to clarify : I have calculated the out tensor size after three con1d layers as
[40,100,7494]
where 40 is batch size, 100 is the out features of conv layer and 7494 is the output of the sequence…right?

Now I want to move further for LSTM layer here I am doing this:

rnn = nn.LSTM(7494, 60, 2,bidirectional=True, dropout=0.5, batch_first=True)
output= rnn(O3)

I have replaced the input size as 7494 (not 100), As it was giving error.
Here the output is a tuple. Now cud you plz tell me how to check the output size further for TransformerEncoderLayer and final linear layer.

I think this shape error is arising here only.
EROOR:
Given groups=1, weight of size [64, 7, 3], expected input[1, 1, 60000] to have 7 channels, but got 1 channels instead

Here O3 is the output of third conv1d layer* Also my confusion is ,
there is no role of previous output features as 100.
@pchandrasekaran @Abhilash_Srivastava Sir plz help me

I = torch.randn(40,1,60000)

layer1 = nn.Sequential(
            nn.Conv1d(1, 64, 3, 1),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=3, stride=2))

layer2 = nn.Sequential(
            nn.Conv1d(64, 100, 5, 1),
            nn.BatchNorm1d(100),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=3, stride=2))

layer3 = nn.Sequential(
            nn.Conv1d(100, 100, 7, 1),
            nn.BatchNorm1d(100),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=3, stride=2))
op = layer1(I)
print(op.shape)

op = layer2(op)
print(op.shape)

op = layer3(op)
print(op.shape)

op is now [40, 100, 7494]

rnn = nn.LSTM(7494, 60, 2, bidirectional=True, dropout=0.5, batch_first=True)
op, not_needed = rnn(op)

In the above op is your output. Ignore the variable not_needed for now. op will be of shape (batch, seq, num_directions * hidden_size), which upon checking, it is.

op shape now = [40, 100, 120]

Here’s where I’m not too sure. I have limited experience using LSTMs and looking at my codes, I’ve only used the final sequence. Depending on what you are going to do, you can leave op as is or if you are taking the final sequence only, reshape it as, op = op[:, -1, :].reshape(40, 1, 120)

encoder_layer = nn.TransformerEncoderLayer(d_model=120, dim_feedforward=512, nhead=your_nhead)
op = encoder_layer(op)

op now will have 2 shapes, one if you used all the sequences another if you used only the last.

[40, 100, 120]
[40, 1, 120]

From here, your Linear Layers are [100x120] or [1x120].

The 100 that you mentioned is used as the sequence length for your LSTM.

1 Like

@pchandrasekaran Thanks
This 100 is not mentioned there in LSTM layer, so is it automatically taken ?
and how to choose between this option of 1 and 100.Yes in this way I am getting this shape [40,100,120]

and in last linear layer
############
self.emotion_layer = nn.Linear(120, self.num_emo_classes)
############

I have read a blog where its wriiten that ’ The input of our fully connected nn.Linear() layer requires an input size corresponding to the number of hidden nodes in the preceding LSTM layer. Therefore we must reshape our data into the form’

So I wrote 120 here.
and finally defined the forward function as

def forward(self,inputs):
out = self.layer1(inputs)
out = self.layer2(out)
out = self.layer3(out)
out = self.lstm(out)
out = self.encoder_layer(out)

    pred = self.emotion_layer(out)
    return pred

Error: Given groups=1, weight of size [64, 7, 3], expected input[1, 1, 60000] to have 7 channels, but got 1 channels instead

I hope I am clear now sir?

I have read a blog where its wriiten that ’ The input of our fully connected nn.Linear() layer requires an input size corresponding to the number of hidden nodes in the preceding LSTM layer.

That explains why my codes only have the last sequence. Ty for that. So use only the last sequence of the LSTM. For the error, could you copy your entire code snippet so I can try and reproduce the error. That looks like an error in the convolution layers.

1 Like

This code is in 4 parts , I dnt know how to post here?
Can I mail You the entire code?