ResNet50 torchvision implementation gives low accuracy on CIFAR-10

I am new to Deep Learning and PyTorch. I am using the resnet-50 model in the torchvision module on cifar10. The accuracy is very low on testing. Is there something wrong with my code?

import torchvision
import torch
import torch.nn as nn
from torch import optim
import os
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import numpy as np
from collections import OrderedDict
import matplotlib.pyplot as plt

transformations=transforms.Compose([transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
trainset=torchvision.datasets.CIFAR10(root='./CIFAR10',download=True,transform=transformations,train=True)

testset=torchvision.datasets.CIFAR10(root='./CIFAR10',download=True,transform=transformations,train=False)

trainloader=DataLoader(dataset=trainset,batch_size=4)
testloader=DataLoader(dataset=testset,batch_size=4)

inputs,labels=next(iter(trainloader))
labels=labels.float()
inputs.size()

print(labels.type())
resnet=torchvision.models.resnet50(pretrained=True)

if torch.cuda.is_available():
  resnet=resnet.cuda()
  inputs,labels=inputs.cuda(),torch.Tensor(labels).cuda()

outputs=resnet(inputs)
outputs.size()

for param in resnet.parameters():
  param.requires_grad=False

numft=resnet.fc.in_features
print(numft)
resnet.fc=torch.nn.Sequential(nn.Linear(numft,1000),nn.ReLU(),nn.Linear(1000,10))
resnet.cuda()
resnet.train(True)
optimizer=torch.optim.SGD(resnet.parameters(),lr=0.001,momentum=0.9)
criterion=nn.CrossEntropyLoss()

for epoch in range(5):
    resnet.train(True)

    trainloss=0
    correct=0
    for x,y in trainloader:
        x,y=x.cuda(),y.cuda()
        optimizer.zero_grad()
        
        yhat=resnet(x)
        loss=criterion(yhat,y)
        
        loss.backward()
        optimizer.step()
        trainloss+=loss.item()
        
        
    
    print('Epoch: {} Loss: {}'.format(epoch,(trainloss/len(trainloader))))
    
    accuracy=[]
    running_corrects=0.0
    for x_test,y_test in testloader:
        
        x_test,y_test=x_test.cuda(),y_test.cuda()
        yhat=resnet(x_test)
        _,z=yhat.max(1)
        running_corrects += torch.sum(y_test == z)
        
    accuracy.append(running_corrects/len(testloader))

print(running_corrects/len(testloader))
accuracy=max(accuracy)
print(accuracy)

OUTPUT AFTER TRAINING/TESTING

Epoch: 0 Loss: 1.9808503997325897
Epoch: 1 Loss: 1.7917569598436356
Epoch: 2 Loss: 1.624434965057373
Epoch: 3 Loss: 1.4082191940283775
Epoch: 4 Loss: 1.1343850775527955
tensor(1.1404, device='cuda:0')
tensor(1.1404, device='cuda:0')

I understood my mistakes with some help from stackoverflow (https://stackoverflow.com/questions/61901144/resnet50-torchvision-implementation-gives-low-accuracy-on-cifar-10) and I am posting a solution which can also be considered as a checklist to check if someone else comes across the same problem I have in the future.

  1. Maybe the accuracy is low due to the low number of epochs

  2. Try using the adapting backbone model (feature extractor) for the CIFAR-10 model by setting param.requires_grad=True for all parameters of resnet models because resnet models are trained on ImageNet data and need adaptation for CIFAR-10.

  3. While calculating the training accuracy, divide the total loss by len(trainloader.dataset) and not len(trainloader).

  4. To help test faster, enclose the testing loop in the with torch,no_grad() and test with the model in the eval state which can be done by model.eval() or model.train(false)

  5. Check for any imbalances in the CIFAR-10 dataset.

I had suffered similar problem like you, torchvision official resnet implementation gives low accuracy on CIFAR-10 dataset, expected > 90% Top-1 accuracy but everytime I got < 90%.

Maybe It’s too late to answer, but I leave the record here to prevent suffering same problem like me.

Now I knows that reason. I recommend you to use this resnet implementation for CIFAR-10.

Let me call torchvision official resnet impl as resnet_official and that link impl as resnet_github.

resnet_offcial’s first conv1 layer is defined by:

self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)

this, kernel_size=7, stride=2 and padding=3 reduce the input image’s size in half at first. this is okay for ImageNet - the original target dataset for torchvision official impl - which has quite large input image size(224 * 224), but for CIFAR dataset, which has small input image size(32 * 32), this induces so many loss from input image. So resnet_official does not fit for dataset which has small input size, I guess.

on the other hand, resnet_github’s first conv1 layer is defined by:

self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)

kernel_size=3, stride=1 and padding=1 keeps the original input size for quite a while in forwarding path. maybe this is more efficient for small input dataset.

I can reach >90% easily with that link implementation with CIFAR-10.

4 Likes

Thank you for the answer! I will surely try out the implementation in the link provided