Overfitting in my model when applying on EEG signals for classification

class MSFBCNN(nn.Module):

def __init__(self,input_dim,output_dim,FT=10):

    super(MSFBCNN, self).__init__()

    self.T = input_dim[1]

    self.FT = FT

    self.D = 1

    self.FS = self.FT*self.D


    self.output_dim = output_dim


    # Parallel temporal convolutions

    self.conv1a = nn.Conv2d(1, self.FT, (1, 65), padding = (0,32),bias=False)

    self.conv1b = nn.Conv2d(1, self.FT, (1, 41), padding = (0,20),bias=False)

    self.conv1c = nn.Conv2d(1, self.FT, (1, 27), padding = (0,13),bias=False)

    self.conv1d = nn.Conv2d(1, self.FT, (1, 17), padding = (0,8),bias=False)

    self.batchnorm1 = nn.BatchNorm2d(4*self.FT, False)

    self.pooling1 = nn.AvgPool2d(kernel_size=(1, 16), stride=(1,2), padding=(0,0))

    # Spatial convolution

    self.conv2 = nn.Conv2d(4*self.FT, self.FS, (self.C,1),padding=(0,0),groups=1,bias=False)

    self.batchnorm2 = nn.BatchNorm2d(self.FS, False)

    #Temporal average pooling

    self.pooling2 = nn.AvgPool2d(kernel_size=(1, 32),stride=(1,4),padding=(0,0))




    self.lstm = nn.LSTM(input_size=131,hidden_size=512,num_layers=2, batch_first=True)


    self.fc1 = nn.Linear(512, self.output_dim)

def forward(self, x):

    # Layer 1

    x1 = self.conv1a(x);

    x2 = self.conv1b(x);

    x3 = self.conv1c(x);

    x4 = self.conv1d(x);

    x = torch.cat([x1,x2,x3,x4],dim=1)

    x = self.batchnorm1(x)

    x = self.pooling1(x)

    x = self.drop(x)


    # Layer 2

    x = torch.pow(self.batchnorm2(self.conv2(x)),2)

    x = self.pooling2(x)

    #x = torch.log(x)

    x = self.drop(x)



    x = x.view(x.size(0),x.size(1),x.size(3))

    #x = x[:, 1, :, :]

    h0 = torch.zeros(2, x.size(0), 512).cuda().requires_grad_()

    c0 = torch.zeros(2, x.size(0), 512).cuda().requires_grad_()

    x, _ = self.lstm(x, (h0, c0))

    # FC Layer

    x = x[:, -1, :]

    x = self.fc1(x)

    return x

Can anybody tell me how to reduce overfitting? I change every hyperparameter. @ptrblck

Hi Abdullah!

First a word on “overfitting:” In my mind, overfitting means that your
out-of-sample prediction performance (e.g., on your validation or test
set) is actually getting worse, even as your training-set performance
is getting better. It is common for the training-set performance to be
better than that for the validation (or test) set, sometimes significantly.
But I don’t call it overfitting until the out-of-sample performance starts
to degrade with further training.

The best way to address overfitting – if you have the data and resources
to train with a large training set – is to use a larger training set.

Next is data augmentation, where you semi-artificially increase the size
of your training set by generating “new” samples, for example by flipping
images left-to-right or cropping them or adding noise to them. (But you
can only push augmentation so far – as you start to generate lots of
“augmentation” samples from a single “real” sample, you’ll hit diminishing

Adding Dropout layers to your model is an established technique for
reducing overfitting.

Weight regularization, such as using the weight_decay parameter in
various pytorch optimizers, is used to reduce overfitting.

There is some lore that batch normalization, for example, pytorch’s
BatchNorm2d, can reduced overfitting, although I’ve seen this argued
both ways.

Also smaller models (fewer parameters) are in general less prone to
overfitting, but even when well trained, smaller models may not perform
as well on your use case as a better-suited, but larger, model would.

Good luck!

K. Frank

Thank you for the reply, @KFrank. How to do data augmentation of EEG signals?

Hi Abdullah!

I don’t really know anything about EEGs, but I think you could try:

Add some modest noise to the signals.

Rescale the strength of the signals. I would imagine that you could
use modestly different rescalings on the different channels, and maybe
have the rescalings vary slowly over time.

Shift / crop the signals in a time window. So let’s say you have 10-minute
EEG recordings. Maybe you could randomly crop them to 8-minute
windows, and train on the 8-minute samples.

I don’t know how much the exact rhythm matters, but perhaps you could
rescale the time axis a little, perhaps in a time-dependent way.

The idea is that you want to create new “augmentation” samples from
your original samples in which specific irrelevant details are modified,
while maintaining the important characteristics. So a trained “EEG
reader” looking at a modified sample should come to the same
conclusion as he would looking at the original unmodified sample
(e.g., seizure vs. no seizure).

The idea is that overfitting occurs when the model learns random details
of specific samples, rather than the general properties it’s supposed to

Let’s say that you are training a cat / dog image classifier, and one of
your dog training images has a big, black dog nose on the left edge
of the image. A model that is overfitting might learn that a black blob
on the left edge of the image means “dog,” but that’s just an artifact
of a particular training sample. If you augment by flipping the image
left-to-right, it will still be a dog (and look like one to a person), but the
model can’t use left-edge-blob to mean dog. Or if you augment by
cropping the image, and snip the dog’s nose off (but the rest of the
dog is still in the image and still looks like a dog), the model can’t use
black blob (absent the rest of the dog) to mean dog.

So, how can you make modified versions of your training-set EEG traces
that preserve all (or most) of the important diagnostic information, but
mess up or hide sample-specific irrelevant details in order to prevent
your model from learning the irrelevant details in lieu of the features that
actually matter?


K. Frank