Prediction showing wrong results even with good training and validation performance

I have used the skin cancer classification competition data in Kaggle. There are 4 labels and the entire data is imbalanced. I ran the resnet 18 model on a 10 fold cross validation split to train the data and each fold was given around 2 epochs. The code has been attached below.
Basically the model gave 98.2% accuracy with 0.07 loss value in the train data and 98.1% accuracy and 0.06 loss value in the validation data. So this seemed pretty good.
However the problem is…prediction.py(code attached below). When I tried to predict, the model keeps giving the result as [0]. Even if it’s a train image data.

Is there something wrong with my code?

Expected result:
if the image is the input, the output should be either 0,1,2 or 3

model.py(where the training happens)

import pandas as pd 
import numpy as np

import torch
import torch.nn as nn

import os

import cv2

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import albumentations as A

from torch.utils.data import TensorDataset, DataLoader,Dataset
from torchvision import models
from collections import defaultdict
from torch.utils.data.sampler import RandomSampler
import torch.optim as optim
from torch.optim import lr_scheduler
from sklearn import model_selection
from tqdm import tqdm
import gc


# generate data from csv file
class Build_dataset(Dataset):
    def __init__(self, csv, split, mode, transform=None):
        self.csv = csv.reset_index(drop=True)
        self.split = split
        self.mode = mode
        self.transform = transform

    def __len__(self):
        return self.csv.shape[0]

    def __getitem__(self, index):
        row = self.csv.iloc[index]

        image = cv2.imread(row.filepath)
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        if self.transform is not None:
            res = self.transform(image=image)
            image = res['image'].astype(np.float32)
        else:
            image = image.astype(np.float32)

        image = image.transpose(2, 0, 1)
        data = torch.tensor(image).float()

        if self.mode == 'test':
            return data
        else:
            return data, torch.tensor(self.csv.iloc[index].target).long()

# training data           
def train_epoch(model, loader, optimizer,loss_fn,device, scheduler,n_examples):

    model = model.train()

losses = []
correct_predictions = 0

for inputs, labels in tqdm(loader):
    inputs = inputs.to(device)
    labels = labels.to(device)

    outputs = model(inputs)

    _, preds = torch.max(outputs, dim=1)
    loss = loss_fn(outputs, labels)
    
    
    correct_predictions += torch.sum(preds == labels)
    losses.append(loss.item())
    
    

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
# here you delete inputs and labels and then use gc.collect
    del inputs, labels
    gc.collect()


return correct_predictions.double() / n_examples, np.mean(losses)

# validation data 
def val_epoch(model, loader,loss_fn, device,n_examples):

    model = model.eval()

    losses = []
    correct_predictions = 0

    with torch.no_grad():
        for inputs, labels in tqdm(loader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            _, preds = torch.max(outputs, dim=1)
            loss = loss_fn(outputs, labels)
            correct_predictions += torch.sum(preds == labels)
            losses.append(loss.item())
            # here you delete inputs and labels and then use gc.collect
            del inputs, labels
            gc.collect()
        

    return correct_predictions.double() / n_examples, np.mean(losses)

        


 def train(fold, model,device, num_epochs):

    df_train = df[df.kfold != fold].reset_index(drop=True)
    df_valid = df[df.kfold == fold].reset_index(drop=True)
    # generate data
    dataset_train = Build_dataset(df_train,  'train', 'train', transform=transforms_train)
    dataset_valid = Build_dataset(df_valid, 'train', 'val', transform=transforms_val)

    #load data 
    train_loader = DataLoader(dataset_train, batch_size = 64,sampler=RandomSampler(dataset_train), 
num_workers=4)
    valid_loader = DataLoader(dataset_valid, batch_size = 32,shuffle = True, num_workers= 4 )

    dataset_train_size = len(dataset_train)

    dataset_valid_size = len(dataset_valid)

    optimizer = optim.Adam(model.parameters(), lr = 1e-4)

    model = model.to(device)

    scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, patience = 3,threshold = 0.001, mode = 
'max')

    loss_fn = nn.CrossEntropyLoss().to(device)

    history = defaultdict(list)

    best_accuracy = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch+1} / {num_epochs}')
        print ('-'*30)
    
        train_acc, train_loss = train_epoch(model, train_loader, optimizer, loss_fn, device, 
scheduler, dataset_train_size)
        print(f'Train loss {train_loss} accuracy {train_acc}')
        valid_acc, valid_loss = val_epoch(model, valid_loader, loss_fn, device,dataset_valid_size)
        print(f'Val   loss {valid_loss} accuracy {valid_acc}')
        print()
    
        history['train_acc'].append(train_acc)
        history['train_loss'].append(train_loss)
        history['val_acc'].append(valid_acc)
        history['val_loss'].append(valid_loss)
    
        if valid_acc > best_accuracy:
            print('saving model')
            torch.save(model.state_dict(), f'best_model_{fold}.bin')
            best_accuracy = valid_acc
    
    print(f'Best Accuracy: {best_accuracy}')

    model.load_state_dict(torch.load(f'best_model_{fold}.bin'))

    return model, history



 if __name__ == '__main__':
    #competition data -2020
    data_dir = "../input/jpeg-melanoma-384x384"
    #competition data - 2019
    data_dir2 = "../input/jpeg-isic2019-384x384"
    # device
    device = torch.device("cuda")

    # augmenting images


    image_size = 384
    transforms_train = A.Compose([
        A.Transpose(p=0.5),
        A.VerticalFlip(p=0.5),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightness(limit=0.2, p=0.75),
        A.RandomContrast(limit=0.2, p=0.75),
        A.OneOf([
            A.MedianBlur(blur_limit=5),
            A.GaussianBlur(blur_limit=5),
            A.GaussNoise(var_limit=(5.0, 30.0)),
        ], p=0.7),

        A.OneOf([
            A.OpticalDistortion(distort_limit=1.0),
            A.GridDistortion(num_steps=5, distort_limit=1.),
            A.ElasticTransform(alpha=3),
        ], p=0.7),

        A.CLAHE(clip_limit=4.0, p=0.7),
        A.HueSaturationValue(hue_shift_limit=10, sat_shift_limit=20, val_shift_limit=10, p=0.5),
        A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=15, border_mode=0, p=0.85),
        A.Resize(image_size, image_size),
        A.Cutout(max_h_size=int(image_size * 0.375), max_w_size=int(image_size * 0.375), num_holes=1, 
p=0.7),    
        A.Normalize()
    ])

    transforms_val = A.Compose([
        A.Resize(image_size, image_size),
        A.Normalize()
    ])
    # create data
    df_train = pd.read_csv(os.path.join(data_dir, "train.csv"))  #/kaggle/input/siim-isic-melanoma-classification/train.csv
    df_train.head()

    df_train['is_ext'] = 0
    df_train['filepath'] = df_train['image_name'].apply(lambda x: os.path.join(data_dir, 'train', f'{x}.jpg'))

    # dataset from 2020 data
    df_train['diagnosis'] = df_train['diagnosis'].apply(lambda x: x.replace('seborrheic keratosis', 'BKL'))
    df_train['diagnosis'] = df_train['diagnosis'].apply(lambda x: x.replace('lichenoid keratosis', 'BKL'))
    df_train['diagnosis'] = df_train['diagnosis'].apply(lambda x: x.replace('solar lentigo', 'BKL'))
    df_train['diagnosis'] = df_train['diagnosis'].apply(lambda x: x.replace('lentigo NOS', 'BKL'))
    df_train['diagnosis'] = df_train['diagnosis'].apply(lambda x: x.replace('cafe-au-lait macule', 'unknown'))
    df_train['diagnosis'] = df_train['diagnosis'].apply(lambda x: x.replace('atypical melanocytic proliferation', 'unknown'))

        
    # shuffle data
    df = df_train.sample(frac=1).reset_index(drop=True)

    # creating 8 different target values
    new_target = {d: idx for idx, d in enumerate(sorted(df.diagnosis.unique()))}
    df['target'] = df['diagnosis'].map(new_target)
    mel_idx = new_target['melanoma']

    # creating 10 fold cross validation data
    df = df_train.sample(frac=1).reset_index(drop=True)
    df['kfold'] = -1
    y = df_train.target.values
    kf = model_selection.StratifiedKFold(n_splits=10,shuffle=True)
    idx = kf.get_n_splits(X=df,y=y)
    print(idx)
    for fold,(x,y) in enumerate(kf.split(X=df,y=y)):
        df.loc[y,'kfold'] = fold

    df = df[['filepath','diagnosis', 'target', 'is_ext', 'kfold']]

    class_names = list(df['diagnosis'].unique())


    # create model

    def create_model(n_classes):
        model = models.resnet18(pretrained=True)

        n_features = model.fc.in_features
        model.fc = nn.Linear(n_features, n_classes)
        return model.to(device)
    
    base_model = create_model(len(class_names)) # model ready
    
    
    
    # run the model
    for i in range(10):
        #train
        base_model, history = train(i, base_model, device, num_epochs = 2) # train data

prediction.py

from torchvision import models
import torch 
import torch.nn as nn
import albumentations as A
import cv2
import os 
import numpy as np

device = torch.device("cuda")
MODEL = None
MODEL_PATH = "../input/prediction/best_model_4.bin"


def create_model(n_classes):
    model = models.resnet18(pretrained=True)

    n_features = model.fc.in_features
    model.fc = nn.Linear(n_features, n_classes)
    return model.to(device)
# generate the data to tensor with transform application

# converting the image to tensor by using the transforms function

class get_image:
    def __init__(self, image_path, targets, transform = None):
        self.image_path = image_path
        self.targets = targets
        self.transform = transform

    def __len__(self):
        return len(self.image_path)
    def __getitem__(self, item):
        targets = self.targets[item]
        resize = 384
        image = cv2.imread(self.image_path[item])
        image = cv2.resize(image, (resize, resize))
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    
        if self.transform is not None:
            res = self.transform(image = image)
            image = res['image'].astype(np.float32)
        image = image.transpose(2, 0, 1)
        data = torch.tensor(image).float()
        targets = torch.tensor(targets)
    
        return data, targets
    
# load the data by using torch data
# predict values 

# predict function
def predict(image_path, model, model_path):
    image_size = 384

    transforms_val = A.Compose([
        A.Resize(image_size, image_size),
        A.Normalize()
    ])

    test_images = [image_path]
    test_targets = [0]

    test_data = get_image(
        image_path = test_images,
        targets = test_targets,
        transform=transforms_val)
    # loading the data
    test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=1, shuffle = False, 
num_workers=0)
    model = create_model(n_classes = 4)
    model.load_state_dict(torch.load(model_path))
    model.to(device)
    model.eval()
    prediction = []

    with torch.no_grad():
        for test_data, test_target in test_dataloader:
            test_data = test_data.to(device)
            test_target = test_target.to(device)
        
            outputs = model(test_data)
            _,preds = torch.max(outputs.cpu(), 1)
        
            #prediction.extend(preds)
        
            prediction = np.vstack((preds)).ravel()
        
            return prediction
        
def upload_predict():
    image_file = "../input/whatever/ISIC_0075663.jpg"

    if image_file:
        pred = predict(image_file, MODEL, MODEL_PATH)
        print(pred)
    
    return pred

the label and it’s count is given right here

3    27126
2     5193
1      584
0      223

Here 0 is considered malignant type cancer and the other labels are of different types.

Here is the link to the data: https://www.kaggle.com/cdeotte/jpeg-melanoma-384x384

Hey Aniruddh,

Why is the return statement inside the for loop? (file: prediction.py)

1 Like

Hey Dexter,

Oops! I guess there was a mistake in copy pasting the code that caused the spacing issue. Sorry about that. But strangely I am unable to edit the main post

Hey Aniruddh,

It’s okay. Well as far as I can see you are stacking the preds but with which variable is a question? Plus it looks like the prediction variable sets to new value after each iteration. Just tell me one thing if you uncomment the previous line and comment this stack line are the answers still the same?

Hey Dexter,

Yes, the answer is still the same even when the stacked line is commented. I guess both the lines have the same objective of storing the predicted value. But I don’t really understand the part where you said that the predicted variable sets to a new value after each iteration.

Also… I am not too sure if the imbalanced data is being handled properly. It would be splendid if I could get an opinion on the training part(model.py) overall just to clear any fishy things out. Thanks a lot for the help.

In your model.py just initialize the zero gradients before doing any calculation, just see if this works or not.