Classification is incoherent with test results on CIFAR10 and custom CNN

Hello everyone,

I’m trying to run classification on the CIFAR10 dataset using a custom CNN which looks like this:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.conv = nn.Sequential(nn.Conv2d(3, 16, kernel_size=3, stride=2),
                                  nn.BatchNorm2d(16), nn.ReLU(inplace=True),
                                  nn.Conv2d(16, 32, kernel_size=3, stride=2),
                                  nn.BatchNorm2d(32), nn.ReLU(inplace=True),
                                  nn.Conv2d(32, 64, kernel_size=3, stride=2),
                                  nn.BatchNorm2d(64), nn.ReLU(inplace=True),
                                  nn.Conv2d(64, 10, kernel_size=3),
                                  nn.BatchNorm2d(10), nn.Flatten())

    def forward(self, x):
        x = self.conv(x)
        return x

The training and test goes well, I’m getting around 73% accuracy. Then I save the model using torch.save(model.state_dict(), save_path).

From another script I load the model like this:

def load_model(path):
    model = Net()
    model.load_state_dict(torch.load(path))
    print(model)
    model.eval()
    return model

Thing is, I want to do detection on CIFAR10 but directly from images without using DataLoader or Dataset classes. I saved images from the DataLoader into class_i.png files and then load them like this:

def load_imgs(path):
    imgs = []
    labels = []
    img_files = os.listdir(path)
    for i in range(len(img_files)):
        label = re.search(r"\w*(?=_)", img_files[i]).group(0)
        img_path = os.path.join(path, img_files[i])
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        img = np.asarray(img)
        imgs.append(img)
        labels.append(label)
    return imgs, labels

Finally, I’m running classification on these images by firstly converting them to tensors:

def classification(model, imgs):
    if torch.cuda.is_available():
        print(f"Using CUDA device {torch.cuda.get_device_name(0)}")
        device = torch.device("cuda:0")
    else:
        print("No CUDA device found, using CPU")
        device = torch.device("cpu")
    to_tensor = torchvision.transforms.ToTensor()
    tensor_imgs = [to_tensor(img).float() for img in imgs]
    pred = []
    with torch.no_grad():
        for img in tensor_imgs:
            img = img.to(device)
            model = model.to(device)
            output = model(img[None, ...])
            pred.append(output.argmax(dim=1, keepdim=True).cpu().squeeze().item())
    return pred

But when comparing predicted classes with the original labels, I’m getting less than 25% of accuracy. My guess is that the problem is somewhere around the way I pass images for detection. Unfortunately, I am strictly limited to the OpenCV library for loading the images.

What would be the correct way to run classification on imported images without using data loaders?

Loading images directly without a DataLoader should work as long as you are applying the same processing (you wouldn’t need data augmentation).
What kind of transformations did you use during the training, i.e. did you resize the images etc.?
To isolate the issue to the data loading you could load some images during training and compare the predictions to the same images returned by the DataLoader (comparing the raw tensor values could also give you more information about the differences).

These are the transforms that I apply for training and validation:

transform_train = transforms.Compose([
    transforms.RandomAffine(5, translate=(0.1, 0.1)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

I realized I made a mistake for inference by not using the same transforms. Applying transform_test on images before inference now gives me around 50% for a run on 200 random images, which is the figure I got after the first epoch during training.

This is how my code looks right now:

def classification(model, imgs):
    if torch.cuda.is_available():
        print(f"Using CUDA device {torch.cuda.get_device_name(0)}")
        print(DIV)
        device = torch.device("cuda:0")
    else:
        print("No CUDA device found, using CPU")
        print(DIV)
        device = torch.device("cpu")

    tensor_imgs = [transform_test(img).unsqueeze_(0) for img in imgs]
    pred = []
    with torch.no_grad():
        for img in tensor_imgs:
            img = img.to(device)
            model = model.to(device)
            output = model(img)
            pred.append(output.data.cpu().numpy().argmax())
    return pred

I didn’t quite understand your suggestion, should I save the images that I use for training and run inference on them for comparison? For the training part I am using DataLoader for both training and test datasets.

This would be one possibility to further isolate it.
Issues like these are often caused by either a different data loading and processing pipeline (majority of the issues) or by an invalid model loading (e.g. using strict=False as a “workaround” to load an invalid state_dict).
To further isolate the root cause I would suggest to start with the data loading and verify that the training and test script create the “same” input data. Note that the tensors wouldn’t be exactly the same, if the training pipeline uses random transformations, so for debugging purposes you could disable them.
The model can be checked by comparing the output for a static input (e.g. torch.ones) and make sure both scripts return an allclose output.

I managed to find my error. It was due to color channels order since OpenCV loads images as BGR and not RGB. Converting images uisng cv2.cvtColor(img, cv2.BGR2RGB) fixed it.