CNN low accuracy results

LukaMlink · March 31, 2024, 1:50pm

Hi, I am working on a classification CNN model with CIFAR10 dataset. My net looks like this:

self.conv1 = Conv2d(in_channels=num_channels, out_channels=16, kernel_size=(5, 5))
self.bn1 = nn.BatchNorm2d(num_features=16) # Corrected the argument here
self.relu1 = ReLU()
self.maxpool1 = MaxPool2d(kernel_size=(2, 2), stride=(2, 2))

self.conv2 = Conv2d(in_channels=16, out_channels=32, kernel_size=(5, 5))
self.bn2 = nn.BatchNorm2d(num_features=32) # Ensure this matches the output channels of conv2
self.relu2 = ReLU()
self.maxpool2 = MaxPool2d(kernel_size=(2, 2), stride=(2, 2))

self.fc1 = Linear(in_features=3255, out_features=120) # fully connected layer
self.relu3 = ReLU()
self.fc2 = Linear(in_features=120, out_features=84) # fully connected layer 2
self.relu4 = ReLU()
self.fc3 = Linear(in_features=84, out_features=num_classes) # fully connected layer 3
self.logSoftmax = LogSoftmax(dim=1) # softmax activation function

I have 10 epochs, the learning rate is 0.001 with 32 batch size and ADAM optimizer.

I get an accuracy of around 50% and I am not really sure, what should my next steps be to improve the models accuracy.

edit: I added batch normalization and accuracy went to 70%, how can I improve acc even more?

Thanks

UMAR_MASUD · April 1, 2024, 3:57am

Hi Luka, I believe some of the ways you can improve your accuracy results are:

Increase the complexity of the network. Add more CNN layers with more filters to your custom model.
Try using residual connections or attention mechanisms.
Use Transfer learning approaches where you take a large pre-trained model and then fine-tune it using your data.
Apply data augmentation techniques to improve generalisation and performance.
Optimising hyperparameters like the number of epochs, learning rate, batch size, optimiser, etc can also help.

Some SOTA methods for CIFAR10 are given here - https://myrtle.ai/learn/how-to-train-your-resnet/ and this bag of tricks - https://myrtle.ai/learn/how-to-train-your-resnet-8-bag-of-tricks/
I hope this is useful!

iman.abduljaleel · April 1, 2024, 4:42pm

could you give me an example code about attention mechanisms and how I could use it in CNN model

Ayush_Aditya · April 1, 2024, 5:30pm

i will suggest you this architecture, try this

class pre(nn.Module):

    def __init__(self, num_classes=10, dropout_p=0.5):

        super(pre, self).__init__()
        self.conv1 = nn.Conv2d(3, 128, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.conv5 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
        self.conv6 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv7 = nn.Conv2d(256, 512, kernel_size=3, padding=0)
        self.conv8 = nn.Conv2d(512, 256, kernel_size=1, padding=0)
        self.conv9 = nn.Conv2d(256, 128, kernel_size=1, padding=0)
        self.avg_pool = nn.AvgPool2d(kernel_size=6)
    
        self.fc = nn.Linear(128, num_classes)

    def forward(self, x):
        x = F.leaky_relu(self.conv1(x), negative_slope=0.1)
        x = F.leaky_relu(self.conv2(x), negative_slope=0.1)
        x = F.leaky_relu(self.conv3(x), negative_slope=0.1)

        x = self.pool1(x)

        x = F.leaky_relu(self.conv4(x), negative_slope=0.1)
        x = F.leaky_relu(self.conv5(x), negative_slope=0.1)
        x = F.leaky_relu(self.conv6(x), negative_slope=0.1)
    
        x = self.pool2(x)

        x = F.leaky_relu(self.conv7(x), negative_slope=0.1)
        x = F.leaky_relu(self.conv8(x), negative_slope=0.1)
        x = F.leaky_relu(self.conv9(x), negative_slope=0.1)
        x = self.avg_pool(x)
        x = x.view(x.size(0), -1)
        x = torch.transpose(x, 0, 1)
        x = self.fc(x)
    
        return x

iman.abduljaleel · April 1, 2024, 10:07pm

thank you for your code but could you explain to me what this code used and how I could choose the perfect CNN layers for my application? I am new in deep learning work

iman.abduljaleel · April 1, 2024, 10:15pm

if you could hep me how i could add attention mechanism to CNN layer

Ayush_Aditya · April 2, 2024, 3:17am

Actually there is no formula find out which architecture will work the best, it just comes with practice. Just keep few things in mind while trying out few things like
if the accuracy is low then try out different things like increasing/decreasing learning rate , increase model complexity by adding more layer and for reference try using research paper, kaggle to check which architecture works the best.
if training accuracy is high that mean over fitting so reduce the complexity…

iman.abduljaleel · April 2, 2024, 4:56am

thank you so much sir

UMAR_MASUD · April 2, 2024, 3:22pm

Hi, you can refer to this repo to get a list of plug-and-play attention mechanisms you can add to any CNN-based model. It is just a layer that you would insert between the CNN layers. Some prominent examples are SE-Net, ECA-Net, CBAM, etc.

Repo: GitHub - pprp/awesome-attention-mechanism-in-cv: Awesome List of Attention Modules and Plug&Play Modules in Computer Vision

iman.abduljaleel · April 2, 2024, 9:18pm

thank you so much sir