Problem in training a 1D Convolution Network (Overfitting issue)


I am having a problem training a 1D convolution NN. My dataset is of the shape [B,C,W] – [50, 1, 402] i.e. trained in a batch of 50 with 1 channel and 402 time samples (each sample is a kind of signal). The problem I am trying to solve is to estimate a value based on the provided signal as input. For instance, each signal will have a different value of output and I am training a network to estimate the output value.

I used the following 1D convolution NN configuration.

class Simple1DCNN(torch.nn.Module):
    def __init__(self):
        super(Simple1DCNN, self).__init__()
        self.layer1 = torch.nn.Conv1d(in_channels = 1, out_channels = 10, kernel_size = 5)
        self.act = torch.nn.ReLU()
        self.layer2 = torch.nn.Conv1d(in_channels = 10, out_channels = 20, kernel_size = 5)
        self.fc1 = torch.nn.Linear(20*394, 100)
        self.fc2 = torch.nn.Linear(100, 50)
        self.fc3 = torch.nn.Linear(50, 1)
        self.conv2_drop = torch.nn.Dropout(0.5)
    def forward(self, x):
        x = self.act(self.layer1(x))
        x = self.act(self.conv2_drop(self.layer2(x)))
        x = x.view(-1, x.shape[1] * x.shape[-1])
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x))
        x = self.fc3(x)      # collecting the output of linear layer
        return x

the problem is that networking is overfitting the training set which can be seen from the results below:

You can see that the training loss is decreasing while the validation loss is not. Here, a tolerance of 10% is defined so that if the output is within 10% of the actual output it is counted as an accurate prediction (this is just for my personal info and has no relation with the training process).

I have done everything, from normalizing the inputs to splitting the dataset to a random training and validation set (80% - 20%)

Some other use full information are:

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr = 1e-3, momentum = 0.5, weight_decay = 0.01)

Can anybody help me with this??

It seems lile your linear layers are “too strong” for this task so you basically learn whole data instead of learning to generalize.
You can try to reduce amount of hidden layers
Edit: and add dropout inbetween those dense layers as it should help too
Edit 2 (kinda offtopic): i also noticed that you don’t have any activation function on your last layer. Was it meant to be like that?

Could you elaborate on your overfitting? I’m seeing a train accuracy of 43%, overfitting is when your model is fitting to closely to your training data (performing amazing on it) and performing poorly on your test set.

I just want to clarify that I’m not misreading your training accuracy. The way we go about preventing overfitting and generally just improving our model performance is different.

Here are some things I would suggest at first glance:

  • Adjust kernel size down to 3 (learning over 3 elements is easier than over 5)
  • Adjusting your number of channels. Consider “upping” your channels and then “downing” your channels. That is, maybe a progression like so: 1 -> 8 -> 4. (So three conv1d layers)
  • Adjusting or adding more linear layers. Currently you’re going from 20*394 = 7880 down to 100. This is a massive reduction. If you add an additional linear layer, you can do something like this instead: 7880 -> 985 (1/8 reduction) -> 50 (~1/20 reduction) -> 1. But if you adjust the number of out channels from the final convolution, these numbers are going to be different.

I would suggest to try playing around with the changes I suggested and seeing what works and what doesn’t work. Don’t worry about your validation accuracy yet, let’s try to do a good job with the training set first. If we see overfitting afterwards, there’s ways we can combat this (regularization or less parameters, other techniques too)

Thank you for your suggestions.

I am actually predicting float values rather than labels (like in regression) so I am just using the value provided by the linear layer rather than using any activation.

P.S. see the other replies for the results.

Thank you for your suggestions.

It was working. I just had a bug ion my code to calculate the validation accuracy. Here are the results (MSE Loss, Accuracy (assumption: if the output is within the 10% of the actual value, it is accurate)., and the learned weights from the final hidden layer to output layer i.e. 50 -> 1). I have followed your suggestions on decreasing the kernel size and increase the neurons in the fully connected layer. It seems like working.

Since, I am posting the results here, does the training/validation procedure look okay? Since I am comparing float values, which would not be 100% equal, I believe the fluctuations make sense, right?

Can you provide your view on the results?

also is the result to observe how the trained model behaves in a test set:

Is there any way to improve this?