Accuracy of Neural Network random

Hey there,
I managed to train my neural network properly, but the given score function doesn’t work on my trained network. I get a score of 0,028 which is 1/36, and since my output layer consists of 36 output the network is basically guessing. I would highly appreciate if someone finds an error in my code to make it work properly.
Thanks alot in advance!

Sign Language Dataset
The Sign Language Dataset consists of 9680 grayscale images of hand signs for the digits 0-9 and the alphabets a-z. Thus, this is a multiclass classification problem with 36 classes. Your task is to build a machine learning model that can accurately classify images from this dataset.
Loading the dataset
You do not need to upload any data. Both the visible training dataset and the hidden test dataset are already available on the Jupyter hub.
In [152]:
import os
import csv
import cv2
import random
import numpy as np
import matplotlib.pyplot as plt
In [151]:
# Setting the path of the training dataset (that was already provided to you)

running_local = True if os.getenv('JUPYTERHUB_USER') is None else False
DATASET_PATH = "."

# Set the location of the dataset
if running_local:
    # If running on your local machine, the sign_lang_train folder's path should be specified here
    local_path = "sign_lang_train"
    if os.path.exists(local_path):
        DATASET_PATH = local_path
else:
    # If running on the Jupyter hub, this data folder is already available
    # You DO NOT need to upload the data!
    DATASET_PATH = "/data/mlproject22/sign_lang_train"
In [150]:
# Utility function

def read_csv(csv_file):
    with open(csv_file, newline='') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data
Data Loading using PyTorch
For creating and training your model, you can work with any machine learning library of your choice.
If you choose to work with PyTorch, you will need to create your own Dataset class for loading the data. This is provided below. See here for a nice example of how to create a custom data loading pipeline in PyTorch.
In [149]:
import torch
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, utils, io
from torchvision.utils import make_grid

from string import ascii_lowercase

class SignLangDataset(Dataset):
    """Sign language dataset"""

    def __init__(self, csv_file, root_dir, class_index_map=None, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.data = read_csv(os.path.join(root_dir,csv_file))
        self.root_dir = root_dir
        self.class_index_map = class_index_map
        self.transform = transform
        # List of class names in order
        self.class_names = list(map(str, list(range(10)))) + list(ascii_lowercase)

    def __len__(self):
        """
        Calculates the length of the dataset-
        """
        return len(self.data)

    def __getitem__(self, idx):
        """
        Returns one sample (dict consisting of an image and its label)
        """
        if torch.is_tensor(idx):
            idx = idx.tolist()

        # Read the image and labels
        image_path = os.path.join(self.root_dir, self.data[idx][1])
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        # Shape of the image should be H,W,C where C=1
        image = np.expand_dims(image, 0)
        # The label is the index of the class name in the list ['0','1',...,'9','a','b',...'z']
        # because we should have integer labels in the range 0-35 (for 36 classes)
        label = self.class_names.index(self.data[idx][0])
                
        sample = {'image': image, 'label': label}

        #if self.transform:
        #    sample = self.transform(sample)

        return sample
Prediction Stub
You will need to provide a function that can be used to make predictions using your final trained model.
IMPORTANT
The name of your prediction function must be leader_board_predict_fn
Your prediction function should be able take as input a 4-D numpy array of shape [batch_size,1,128,128] and produce predictions in the form of a 1-D numpy array of shape [batch_size,].
Predictions for each image should be an integer in the range 0-35, that is 0 for the digit  0
0
 , 1 for the digit  1
1
 , .... , 9 for the digit  9
9
 , 10 for the letter  𝑎
a
 , 11 for the letter  𝑏
b
 , ..., 35 for the letter  𝑧
z
 .
Your prediction function should internally load your trained model and take care of any data transformations that you need.
Below we provide an implementation of the leader_board_predict_fn function, in which we show how a trained model can be loaded (from the weights saved on the disk) for making predictions. This example is for PyTorch, but you are free to use any framework of your choice for your model. The only requirement is that this function should accept a numpy array (with the proper shape) as the input and should produce a numpy array (with the proper shape) as the output. What you do internally is up to you.
Note that the model that we load here is not properly trained and so its performance is very bad. This example is only for showing you how a model can be loaded in PyTorch and how predictions can be made.
In [153]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.output_size = output_size
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        
        x = x.view(-1, self.input_size)
        x = F.relu(self.fc1(x))  
        x = F.relu(self.fc2(x))  
        x = self.fc3(x)          
        return x
In [154]:
# Define hyperparameters
LEARNING_RATE = 0.001
MOMENTUM = 0.9
NUM_EPOCHS = 5
HIDDEN_SIZE = 300
TRAIN_BATCH_SIZE = 64
INPUT_SIZE = 128*128
OUTPUT_SIZE = 36

# Set dataset path
dataset_path = DATASET_PATH


# Create a Dataset object
sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)

# Create a Dataloader
sign_lang_dataloader = DataLoader(sign_lang_dataset, 
                                  batch_size=64,
                                  shuffle=True, 
                                  drop_last=True,
                                  num_workers=0)
In [155]:
num_train_batches = len(sign_lang_dataloader)
print(num_train_batches)
151
In [156]:
import torch.optim as optim
import torch.nn.functional as F
from torchsummary import summary
from tqdm import tqdm
from torchvision import datasets, transforms

net = Net(INPUT_SIZE, HIDDEN_SIZE, OUTPUT_SIZE)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net_mnist.parameters(), lr=LEARNING_RATE, momentum=MOMENTUM)

summary(net, input_size=(INPUT_SIZE,), device="cpu")
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Linear-1                  [-1, 300]       4,915,500
            Linear-2                  [-1, 300]          90,300
            Linear-3                   [-1, 36]          10,836
================================================================
Total params: 5,016,636
Trainable params: 5,016,636
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.06
Forward/backward pass size (MB): 0.00
Params size (MB): 19.14
Estimated Total Size (MB): 19.20
----------------------------------------------------------------
In [157]:
def train_neural_network_pytorch_minibatch(net, train_loader, optimizer, criterion, num_epochs):
    
    net.train()  # Set the network in training mode   
    
    with tqdm(total=num_epochs * len(train_loader)) as pbar:
        for epoch in range(num_epochs):
            for i, batch in enumerate(train_loader):
                pbar.update(1)
                x,labels = batch["image"], batch["label"]
                x,labels = x.float(), labels.long()
                x *= 255.0/x.max() # normalize pixels (x) to [0:1]
                x = x.view(-1, 128*128)
                outputs = net(x)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()
                optimizer.zero_grad()
    return net
In [158]:
train_neural_network_pytorch_minibatch(net, sign_lang_dataloader, optimizer, criterion, NUM_EPOCHS)
100%|██████████| 755/755 [04:08<00:00,  3.04it/s]
In [144]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm

def leader_board_predict_fn(input_batch):
    """
    Function for making predictions using your trained model.
    
    Args:
        input_batch (numpy array): Input images (4D array of shape 
                                   [batch_size, 1, 128, 128])
        
    Returns:
        output (numpy array): Predictions of the your trained model 
                             (1D array of int (0-35) of shape [batch_size, ])
    """
    prediction = None
    batch_size, channels, height, width = input_batch.shape
    
    # Set the network to evaluation mode
    net.eval()
    
    # VERY IMPORTANT
    # Convert the input batch to a torch Tensor and set
    # the data type to the same type as the network
    input_batch = torch.from_numpy(input_batch).float()
       
    # A forward pass with the input batch produces a batch of logits
    # In the network that we use here, Softmax is not applied to the output
    # This may be different for your network.
    logits = net(input_batch)
    
    # Final classification predictions are taken by taking an argmax over the logits
    # The prediction is converted to a numpy array
    prediction = torch.argmax(logits, dim=1).numpy()
    
    assert prediction is not None, "Prediction cannot be None"
    assert isinstance(prediction, np.ndarray), "Prediction must be a numpy array"

    return prediction
Evaluation
Your final model will be evaluated on a hidden test set containing images similar to the dataset that you are provided with.
For evaluating the performance of your model, we will use the normalized accuracy_score metric from sklearn. This is simply the percentage of correct predictions that your model makes for all the images of the hidden test set. Hence, if all the predictions are correct, the score is 1.0 and if all predictions are incorrect, the score is 0.0. We will use the sklearn metric so that the accuracy function is agnostic to the machine learning framework you use.
In [145]:
from sklearn.metrics import accuracy_score
  
def accuracy(dataset_path, max_batches=30):
    """
    Calculates the average prediction accuracy.
    
    IMPORTANT
    =========
    In this function, we use PyTorch only for loading the data. When your `leader_board_predict_fn`
    function is called, we pass the arguments to it as numpy arrays. The output of `leader_board_predict_fn`
    is also expected to be a numpy array. So, as long as your `leader_board_predict_fn` function takes
    numpy arrays as input and produces numpy arrays as output (with the proper shapes), it does not
    matter what framework you used for training your network or for producing your predictions.
    
    Args:
        dataset_path (str): Path of the dataset directory
        
    Returns:
        accuracy (float): Average accuracy score over all images (float in the range 0.0-1.0)
    """

    # Create a Dataset object
    sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)

    # Create a Dataloader
    sign_lang_dataloader = DataLoader(sign_lang_dataset, 
                                      batch_size=64,
                                      shuffle=True, 
                                      drop_last=True,
                                      num_workers=0)
    
    # Calculate accuracy for each batch
    accuracies = list()
    for batch_idx, sample in enumerate(sign_lang_dataloader):
        x = sample["image"].numpy()
        y = sample["label"].numpy()
        prediction = leader_board_predict_fn(x)
        accuracies.append(accuracy_score(y, prediction, normalize=True))
        
        # We will consider only the first 30 batches
        if batch_idx == (max_batches - 1):
            break

    assert len(accuracies) == max_batches
    
    # Return the average accuracy
    mean_accuracy = np.mean(accuracies)
    return mean_accuracy
We will now use your leader_board_predict_fn function for calculating the accuracy of your model. We provide the code for testing your loaded model on the visible training data. We will also evaluate your model's performance on the test dataset (the test dataset should only be used for evaluation and is NOT to be used for training your model).
In [159]:
def get_score():
    """
    Function to compute scores for train and test datasets.
    """
    import torch
    import numpy as np
    from sklearn.metrics import accuracy_score
    import os
    import pwd
    import time
    import pathlib
    import pandas as pd
    import datetime
    
    ### LEADER BOARD TEST
    seed = 200

    torch.manual_seed(seed)
    np.random.seed(seed)

    # Calculate the accuracy on the training dataset
    # to check that your `leader_board_predict_fn` function 
    # works without any error
    dataset_score = accuracy(dataset_path=DATASET_PATH)

    assert isinstance(dataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
    assert 0.0<=dataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"

    # This is your accuracy score on the visible training dataset
    # This is NOT used for the leaderboard.
    print(f"Accuracy score on training data: {dataset_score}")

    # There is a hidden test that will evaluate your trained model on the hidden test set
    # This hidden dataset and the accuracy for this will not be visible to you when you
    # validate this notebook. The accuracy score on the hidden dataset will be used
    # for calculating your leaderboard score.

    seed = 200

    torch.manual_seed(seed)
    np.random.seed(seed)

    user_id = pwd.getpwuid( os.getuid() ).pw_name
    curtime = time.time()
    dt_now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")

    try:  # 
        HIDDEN_DATASET_PATH = os.path.expanduser("/data/mlproject22-test-data/sign_lang_test")
        hiddendataset_score = accuracy(dataset_path=HIDDEN_DATASET_PATH)

        assert isinstance(hiddendataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
        assert 0.0<=hiddendataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"

        print(f"Leaderboard score: {hiddendataset_score}")

        score_dict = dict(
            score_hidden=hiddendataset_score,
            score_train=dataset_score,
            unixtime=curtime,
            user=user_id,
            dt=dt_now,
            comment="",
        )

    except Exception as e:
        err = str(e)
        score_dict = dict(
            score_hidden=float("nan"),
            score_train=dataset_score,
            unixtime=curtime,
            user=user_id,
            dt=dt_now,
            comment=err
        )


    #if list(pathlib.Path(os.getcwd()).parents)[0].name == 'source':
    #    print("we are in the source directory... replacing values.")
    #    print(pd.DataFrame([score_dict]))
    #    score_dict["score_hidden"] = -1
    #    score_dict["score_train"] = -1
    #    print("new values:")
    #    print(pd.DataFrame([score_dict]))

    pd.DataFrame([score_dict]).to_csv("sign_lang.csv", index=False)

    ### LEADER BOARD TEST
    
get_score()
Accuracy score on training data: 0.028645833333333332
Leaderboard score: 0.043229166666666666```

I don’t see any obvious issues in your code so would recommend to try to overfit a small dataset first (e.g. just 10 samples) by playing around with some hyperparameters and make sure your model is able to do so. Once done you could then scale up the use case again.