Music Encoder model

Hello all,
I am trying to make a CNN model for Encoder Parameter Suggestion for Audio data
Input: Audio File(Length : 480000 samples)
Output: Parameter(Length : 469, eg :[0001111222001233312000] )

I am getting very low Validation and Test Accuracy

class AudioClassifier(nn.Module):
    def __init__(self):
        super(AudioClassifier, self).__init__()

        self.conv11 = nn.Conv1d(in_channels=1, out_channels=16, kernel_size=Kernel_size_conv, stride=1,
                                bias=True, padding=int((Kernel_size_conv - 1) / 2))
        self.conv12 = nn.Conv1d(in_channels=16, out_channels=16, kernel_size=Kernel_size_conv, stride=1,
                                bias=True, padding=int((Kernel_size_conv - 1) / 2))
        self.conv13 = nn.Conv1d(in_channels=16, out_channels=4, kernel_size=Kernel_size_conv,
                                stride=1024, bias=True)

    def forward(self, X):
        out1 = torch.tanh(self.conv11(X))
        out1 = torch.tanh(self.conv12(out1))
        out1 = (self.conv13(out1))
        return out1

optimizer = torch.optim.Adam(model.parameters(),lr=1e-2, weight_decay=1e-5)
loss_fn = nn.CrossEntropyLoss()

    for epoch in range(E):
        for batch in range(B)
            pred_1= model(training_data)
            loss = loss_fn(pred_1, Target)

Can Anybody suggest any other approach to rectify it.

What about you training loss and training accuracy? If it is good, then your model is overfitting (which would be surprising given that it is quite simple), if not, then your model is probably too simple for the task

Training loss is arnd 0.2 and Training Accuracy is on avg 92% .
Currently Every epoch Data is shuffled and passed to network.

I cannot find any mistake which I might have made in coding.

Are the classes balanced? If you have 1000 classes but 92% of your examples are of one class, then you can get an accuracy of 92% just by always predicting the same class. What kind of error does your model makes? Have a look at the predictions (training and validation) to check that they make sense

If class imbalance is not an issue, then you might try different techniques against overfitting incl. (but not limited to): adding dropout, adding a regularisation term, decreasing the capacity of your network, etc.


Even after reducing the parameters , Model is over-fitting after a approx 8-9 epochs and takes a while to reduce the training loss to zero.

Any suggestions on this

Hard to say without seeing the data. How did you build your test set? Is it a sample from your original dataset or does it come from another source? What about the training set distribution (see my previous question)? How much example in each subset? With no information, it’s hard to make recommendations…

Test data was chosen from the same dataset as Training set .
Dataset is audio files int 16
Training set : 83 Audio files
Test set : 10 Audio files