Error when trying to create logits

cs_student · July 4, 2022, 4:49pm

Hello together,

during a course called “Machine Learning” I got the exercise to build a neural network that can detect sign language pictures. Training the network using gradient descent works, but the function “leader_board_predict_fn()” throws an error when trying to evaluate the accuracy of my network. Any help is much appreciated since I’m just starting using Python, Torch and other libaries.

Thanks alot in advance!
Code:

Sign Language Dataset
The Sign Language Dataset consists of 9680 grayscale images of hand signs for the digits 0-9 and the alphabets a-z. Thus, this is a multiclass classification problem with 36 classes. Your task is to build a machine learning model that can accurately classify images from this dataset.
Loading the dataset
You do not need to upload any data. Both the visible training dataset and the hidden test dataset are already available on the Jupyter hub.
In [31]:
import os
import csv
import cv2
import random
import numpy as np
import matplotlib.pyplot as plt
In [32]:
# Setting the path of the training dataset (that was already provided to you)

running_local = True if os.getenv('JUPYTERHUB_USER') is None else False
DATASET_PATH = "."

# Set the location of the dataset
if running_local:
    # If running on your local machine, the sign_lang_train folder's path should be specified here
    local_path = "sign_lang_train"
    if os.path.exists(local_path):
        DATASET_PATH = local_path
else:
    # If running on the Jupyter hub, this data folder is already available
    # You DO NOT need to upload the data!
    DATASET_PATH = "/data/mlproject22/sign_lang_train"
In [33]:
# Utility function

def read_csv(csv_file):
    with open(csv_file, newline='') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data
Data Loading using PyTorch
For creating and training your model, you can work with any machine learning library of your choice.
If you choose to work with PyTorch, you will need to create your own Dataset class for loading the data. This is provided below. See here for a nice example of how to create a custom data loading pipeline in PyTorch.
In [88]:
import torch
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, utils, io
from torchvision.utils import make_grid

from string import ascii_lowercase

class SignLangDataset(Dataset):
    """Sign language dataset"""

    def __init__(self, csv_file, root_dir, class_index_map=None, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.data = read_csv(os.path.join(root_dir,csv_file))
        self.root_dir = root_dir
        self.class_index_map = class_index_map
        self.transform = transform
        # List of class names in order
        self.class_names = list(map(str, list(range(10)))) + list(ascii_lowercase)

    def __len__(self):
        """
        Calculates the length of the dataset-
        """
        return len(self.data)

    def __getitem__(self, idx):
        """
        Returns one sample (dict consisting of an image and its label)
        """
        if torch.is_tensor(idx):
            idx = idx.tolist()

        # Read the image and labels
        image_path = os.path.join(self.root_dir, self.data[idx][1])
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        # Shape of the image should be H,W,C where C=1
        image = np.expand_dims(image, 0)
        # The label is the index of the class name in the list ['0','1',...,'9','a','b',...'z']
        # because we should have integer labels in the range 0-35 (for 36 classes)
        label = self.class_names.index(self.data[idx][0])
                
        sample = {'image': image, 'label': label}

        #if self.transform:
        #    sample = self.transform(sample)

        return sample
In [106]:
import glob
csv_file = os.path.basename(glob.glob(f"{DATASET_PATH}/*.csv")[-1])

# Create dataset
sign_lang_dataset = SignLangDataset(csv_file=csv_file, root_dir=DATASET_PATH)

# Create dataloader
# Check the documentation here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
# Here we use a batch size of 8 for convenience, set this value appropriately while training
batch_size = 64
sign_lang_dataloader = DataLoader(dataset=sign_lang_dataset, batch_size=batch_size, shuffle=True, drop_last=True)

print(f'Number of batches: {len(sign_lang_dataloader)}')

for data in sign_lang_dataloader:
    print(f'Shape of image batch: {data["image"].shape}')
    print(f'Len of labels in batch: {len(data["label"])}')
    
    # This shows the ground truth label for each image in the batch
    print(f'Example labels: {data["label"]}')
    
    # We exit the loop after getting the stats from 1 batch
    break

    
# Now let us display the images and labels from the first batch

for batch_num, data in enumerate(sign_lang_dataloader):
    
    fig, axes = plt.subplots(1, batch_size, figsize=(20,4))
    
    for i, ax in enumerate(axes.flatten()):
        ax.imshow(data["image"][i,0], cmap="bone")
        ax.set_title(f"Label: {data['label'][i]} \n shape: {list(data['image'][i].shape)}")
        
    break
Number of batches: 151
Shape of image batch: torch.Size([64, 1, 128, 128])
Len of labels in batch: 64
Example labels: tensor([19, 21, 35, 10,  9,  3, 18,  0, 34, 30,  4,  9,  8,  4, 19, 17, 35, 12,
        31, 17, 22, 32, 25, 12, 24, 21,  6,  8, 11, 11, 31,  4, 16, 21, 21, 25,
        16, 28, 25, 34, 17, 27,  6,  4, 31,  0,  6, 10, 25, 34, 28, 12, 13, 18,
        21, 21, 30, 25, 13, 18,  4, 31, 28, 33])

In [36]:
labels = Variable(data['label'].long())
print(labels[:,])
tensor([12, 35,  6, 20, 35, 21, 34, 21])
Prediction Stub
You will need to provide a function that can be used to make predictions using your final trained model.
IMPORTANT
The name of your prediction function must be leader_board_predict_fn
Your prediction function should be able take as input a 4-D numpy array of shape [batch_size,1,128,128] and produce predictions in the form of a 1-D numpy array of shape [batch_size,].
Predictions for each image should be an integer in the range 0-35, that is 0 for the digit  0
0
 , 1 for the digit  1
1
 , .... , 9 for the digit  9
9
 , 10 for the letter  𝑎
a
 , 11 for the letter  𝑏
b
 , ..., 35 for the letter  𝑧
z
 .
Your prediction function should internally load your trained model and take care of any data transformations that you need.
Below we provide an implementation of the leader_board_predict_fn function, in which we show how a trained model can be loaded (from the weights saved on the disk) for making predictions. This example is for PyTorch, but you are free to use any framework of your choice for your model. The only requirement is that this function should accept a numpy array (with the proper shape) as the input and should produce a numpy array (with the proper shape) as the output. What you do internally is up to you.
Note that the model that we load here is not properly trained and so its performance is very bad. This example is only for showing you how a model can be loaded in PyTorch and how predictions can be made.
In [107]:
# Set dataset path
dataset_path = DATASET_PATH


# Create a Dataset object
sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)

# Create a Dataloader
sign_lang_dataloader = DataLoader(sign_lang_dataset, 
                                  batch_size=64,
                                  shuffle=True, 
                                  drop_last=True,
                                  num_workers=0)
In [94]:
 
Out[94]:
{'image': array([[[41, 41, 41, ..., 41, 41, 41],
         [41, 41, 41, ..., 41, 41, 41],
         [41, 41, 41, ..., 41, 41, 41],
         ...,
         [41, 41, 41, ..., 41, 41, 41],
         [41, 41, 41, ..., 41, 41, 41],
         [41, 41, 41, ..., 41, 41, 41]]], dtype=uint8),
 'label': 21}
In [99]:
def run_gradient_descent(model, train_data, batch_size=64, learning_rate=0.01, weight_decay=0, num_epochs=10):
    model.train()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    train_loader = sign_lang_dataloader
    
    with tqdm(total=num_epochs * len(train_loader)) as pbar:
        for epoch in range(num_epochs):
            for i, batch in enumerate(train_loader):
                pbar.update(1)
                xs,ts = batch["image"], batch["label"]
                xs,ts = xs.float(), ts.long()
                if len(ts) != batch_size:
                    continue
                xs = xs.view(-1, 128*128)
                zs = model(xs)
                loss = criterion(zs, ts)
                loss.backward()
                optimizer.step()
                optimizer.zero_grad()
    return model
In [114]:
network = nn.Linear(128*128, 36).float()
network = run_gradient_descent(network, sign_lang_dataset)
100%|██████████| 1510/1510 [03:53<00:00,  6.47it/s]
In [123]:
 
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [123], in <cell line: 2>()
      3 x = sample["image"].numpy()
      4 y = sample["label"].numpy()
----> 5 prediction = leader_board_predict_fn(x)
      6 accuracies.append(accuracy_score(y, prediction, normalize=True))

Input In [108], in leader_board_predict_fn(input_batch)
     28 input_batch = torch.from_numpy(input_batch).float()
     30 # A forward pass with the input batch produces a batch of logits
     31 # In the network that we use here, Softmax is not applied to the output
     32 # This may be different for your network.
---> 33 logits = network(input_batch)
     35 # Final classification predictions are taken by taking an argmax over the logits
     36 # The prediction is converted to a numpy array
     37 prediction = torch.argmax(logits, dim=1).numpy()

File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
   1106 # If we don't have any hooks, we want to skip the rest of the logic in
   1107 # this function, and just call forward.
   1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110     return forward_call(*input, **kwargs)
   1111 # Do not call functions when jit is used
   1112 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/linear.py:103, in Linear.forward(self, input)
    102 def forward(self, input: Tensor) -> Tensor:
--> 103     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (8192x128 and 16384x36)
In [108]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm

def leader_board_predict_fn(input_batch):
    """
    Function for making predictions using your trained model.
    
    Args:
        input_batch (numpy array): Input images (4D array of shape 
                                   [batch_size, 1, 128, 128])
        
    Returns:
        output (numpy array): Predictions of the your trained model 
                             (1D array of int (0-35) of shape [batch_size, ])
    """
    prediction = None
    batch_size, channels, height, width = input_batch.shape
    
    # Set the network to evaluation mode
    network.eval()
    
    # VERY IMPORTANT
    # Convert the input batch to a torch Tensor and set
    # the data type to the same type as the network
    input_batch = torch.from_numpy(input_batch).float()
       
    # A forward pass with the input batch produces a batch of logits
    # In the network that we use here, Softmax is not applied to the output
    # This may be different for your network.
    logits = network(input_batch)
    
    # Final classification predictions are taken by taking an argmax over the logits
    # The prediction is converted to a numpy array
    prediction = torch.argmax(logits, dim=1).numpy()
    
    assert prediction is not None, "Prediction cannot be None"
    assert isinstance(prediction, np.ndarray), "Prediction must be a numpy array"

    return prediction
Evaluation
Your final model will be evaluated on a hidden test set containing images similar to the dataset that you are provided with.
For evaluating the performance of your model, we will use the normalized accuracy_score metric from sklearn. This is simply the percentage of correct predictions that your model makes for all the images of the hidden test set. Hence, if all the predictions are correct, the score is 1.0 and if all predictions are incorrect, the score is 0.0. We will use the sklearn metric so that the accuracy function is agnostic to the machine learning framework you use.
In [109]:
from sklearn.metrics import accuracy_score
  
def accuracy(dataset_path, max_batches=30):
    """
    Calculates the average prediction accuracy.
    
    IMPORTANT
    =========
    In this function, we use PyTorch only for loading the data. When your `leader_board_predict_fn`
    function is called, we pass the arguments to it as numpy arrays. The output of `leader_board_predict_fn`
    is also expected to be a numpy array. So, as long as your `leader_board_predict_fn` function takes
    numpy arrays as input and produces numpy arrays as output (with the proper shapes), it does not
    matter what framework you used for training your network or for producing your predictions.
    
    Args:
        dataset_path (str): Path of the dataset directory
        
    Returns:
        accuracy (float): Average accuracy score over all images (float in the range 0.0-1.0)
    """

    # Create a Dataset object
    sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)

    # Create a Dataloader
    sign_lang_dataloader = DataLoader(sign_lang_dataset, 
                                      batch_size=64,
                                      shuffle=True, 
                                      drop_last=True,
                                      num_workers=0)
    
    # Calculate accuracy for each batch
    accuracies = list()
    for batch_idx, sample in enumerate(sign_lang_dataloader):
        x = sample["image"].numpy()
        y = sample["label"].numpy()
        prediction = leader_board_predict_fn(x)
        accuracies.append(accuracy_score(y, prediction, normalize=True))
        
        # We will consider only the first 30 batches
        if batch_idx == (max_batches - 1):
            break

    assert len(accuracies) == max_batches
    
    # Return the average accuracy
    mean_accuracy = np.mean(accuracies)
    return mean_accuracy
We will now use your leader_board_predict_fn function for calculating the accuracy of your model. We provide the code for testing your loaded model on the visible training data. We will also evaluate your model's performance on the test dataset (the test dataset should only be used for evaluation and is NOT to be used for training your model).
In [124]:
def get_score():
    """
    Function to compute scores for train and test datasets.
    """
    import torch
    import numpy as np
    from sklearn.metrics import accuracy_score
    import os
    import pwd
    import time
    import pathlib
    import pandas as pd
    import datetime
    
    ### LEADER BOARD TEST
    seed = 200

    torch.manual_seed(seed)
    np.random.seed(seed)

    # Calculate the accuracy on the training dataset
    # to check that your `leader_board_predict_fn` function 
    # works without any error
    dataset_score = accuracy(dataset_path=DATASET_PATH)

    assert isinstance(dataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
    assert 0.0<=dataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"

    # This is your accuracy score on the visible training dataset
    # This is NOT used for the leaderboard.
    print(f"Accuracy score on training data: {dataset_score}")

    # There is a hidden test that will evaluate your trained model on the hidden test set
    # This hidden dataset and the accuracy for this will not be visible to you when you
    # validate this notebook. The accuracy score on the hidden dataset will be used
    # for calculating your leaderboard score.

    seed = 200

    torch.manual_seed(seed)
    np.random.seed(seed)

    user_id = pwd.getpwuid( os.getuid() ).pw_name
    curtime = time.time()
    dt_now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")

    try:  # 
        HIDDEN_DATASET_PATH = os.path.expanduser("/data/mlproject22-test-data/sign_lang_test")
        hiddendataset_score = accuracy(dataset_path=HIDDEN_DATASET_PATH)

        assert isinstance(hiddendataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
        assert 0.0<=hiddendataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"

        print(f"Leaderboard score: {hiddendataset_score}")

        score_dict = dict(
            score_hidden=hiddendataset_score,
            score_train=dataset_score,
            unixtime=curtime,
            user=user_id,
            dt=dt_now,
            comment="",
        )

    except Exception as e:
        err = str(e)
        score_dict = dict(
            score_hidden=float("nan"),
            score_train=dataset_score,
            unixtime=curtime,
            user=user_id,
            dt=dt_now,
            comment=err
        )


    #if list(pathlib.Path(os.getcwd()).parents)[0].name == 'source':
    #    print("we are in the source directory... replacing values.")
    #    print(pd.DataFrame([score_dict]))
    #    score_dict["score_hidden"] = -1
    #    score_dict["score_train"] = -1
    #    print("new values:")
    #    print(pd.DataFrame([score_dict]))

    pd.DataFrame([score_dict]).to_csv("sign_lang.csv", index=False)

    ### LEADER BOARD TEST
    
get_score()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [124], in <cell line: 89>()
     85     pd.DataFrame([score_dict]).to_csv("sign_lang.csv", index=False)
     87     ### LEADER BOARD TEST
---> 89 get_score()

Input In [124], in get_score()
     19 np.random.seed(seed)
     21 # Calculate the accuracy on the training dataset
     22 # to check that your `leader_board_predict_fn` function 
     23 # works without any error
---> 24 dataset_score = accuracy(dataset_path=DATASET_PATH)
     26 assert isinstance(dataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
     27 assert 0.0<=dataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"

Input In [109], in accuracy(dataset_path, max_batches)
     35 x = sample["image"].numpy()
     36 y = sample["label"].numpy()
---> 37 prediction = leader_board_predict_fn(x)
     38 accuracies.append(accuracy_score(y, prediction, normalize=True))
     40 # We will consider only the first 30 batches

Input In [108], in leader_board_predict_fn(input_batch)
     28 input_batch = torch.from_numpy(input_batch).float()
     30 # A forward pass with the input batch produces a batch of logits
     31 # In the network that we use here, Softmax is not applied to the output
     32 # This may be different for your network.
---> 33 logits = network(input_batch)
     35 # Final classification predictions are taken by taking an argmax over the logits
     36 # The prediction is converted to a numpy array
     37 prediction = torch.argmax(logits, dim=1).numpy()

File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
   1106 # If we don't have any hooks, we want to skip the rest of the logic in
   1107 # this function, and just call forward.
   1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110     return forward_call(*input, **kwargs)
   1111 # Do not call functions when jit is used
   1112 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/linear.py:103, in Linear.forward(self, input)
    102 def forward(self, input: Tensor) -> Tensor:
--> 103     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (8192x128 and 16384x36)```

Karthik_Ganesan · July 4, 2022, 8:26pm

Based on the last line, the error seems to be that your Linear layer isn’t the right shape. You are creating the network as:

network = nn.Linear(128*128, 36)

This means that layer (and therefore your entire network), expects an input of size 128x128 = 16384. and you should have 36 output classes. But it seems you are providing an input of size 8192x128. What is the shape of your input. Make sure it matches the shape expected by your Linear layer to fix this error.