Hello together,
during a course called “Machine Learning” I got the exercise to build a neural network that can detect sign language pictures. Training the network using gradient descent works, but the function “leader_board_predict_fn()” throws an error when trying to evaluate the accuracy of my network. Any help is much appreciated since I’m just starting using Python, Torch and other libaries.
Thanks alot in advance!
Code:
Sign Language Dataset
The Sign Language Dataset consists of 9680 grayscale images of hand signs for the digits 0-9 and the alphabets a-z. Thus, this is a multiclass classification problem with 36 classes. Your task is to build a machine learning model that can accurately classify images from this dataset.
Loading the dataset
You do not need to upload any data. Both the visible training dataset and the hidden test dataset are already available on the Jupyter hub.
In [31]:
import os
import csv
import cv2
import random
import numpy as np
import matplotlib.pyplot as plt
In [32]:
# Setting the path of the training dataset (that was already provided to you)
running_local = True if os.getenv('JUPYTERHUB_USER') is None else False
DATASET_PATH = "."
# Set the location of the dataset
if running_local:
# If running on your local machine, the sign_lang_train folder's path should be specified here
local_path = "sign_lang_train"
if os.path.exists(local_path):
DATASET_PATH = local_path
else:
# If running on the Jupyter hub, this data folder is already available
# You DO NOT need to upload the data!
DATASET_PATH = "/data/mlproject22/sign_lang_train"
In [33]:
# Utility function
def read_csv(csv_file):
with open(csv_file, newline='') as f:
reader = csv.reader(f)
data = list(reader)
return data
Data Loading using PyTorch
For creating and training your model, you can work with any machine learning library of your choice.
If you choose to work with PyTorch, you will need to create your own Dataset class for loading the data. This is provided below. See here for a nice example of how to create a custom data loading pipeline in PyTorch.
In [88]:
import torch
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, utils, io
from torchvision.utils import make_grid
from string import ascii_lowercase
class SignLangDataset(Dataset):
"""Sign language dataset"""
def __init__(self, csv_file, root_dir, class_index_map=None, transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied on a sample.
"""
self.data = read_csv(os.path.join(root_dir,csv_file))
self.root_dir = root_dir
self.class_index_map = class_index_map
self.transform = transform
# List of class names in order
self.class_names = list(map(str, list(range(10)))) + list(ascii_lowercase)
def __len__(self):
"""
Calculates the length of the dataset-
"""
return len(self.data)
def __getitem__(self, idx):
"""
Returns one sample (dict consisting of an image and its label)
"""
if torch.is_tensor(idx):
idx = idx.tolist()
# Read the image and labels
image_path = os.path.join(self.root_dir, self.data[idx][1])
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
# Shape of the image should be H,W,C where C=1
image = np.expand_dims(image, 0)
# The label is the index of the class name in the list ['0','1',...,'9','a','b',...'z']
# because we should have integer labels in the range 0-35 (for 36 classes)
label = self.class_names.index(self.data[idx][0])
sample = {'image': image, 'label': label}
#if self.transform:
# sample = self.transform(sample)
return sample
In [106]:
import glob
csv_file = os.path.basename(glob.glob(f"{DATASET_PATH}/*.csv")[-1])
# Create dataset
sign_lang_dataset = SignLangDataset(csv_file=csv_file, root_dir=DATASET_PATH)
# Create dataloader
# Check the documentation here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
# Here we use a batch size of 8 for convenience, set this value appropriately while training
batch_size = 64
sign_lang_dataloader = DataLoader(dataset=sign_lang_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
print(f'Number of batches: {len(sign_lang_dataloader)}')
for data in sign_lang_dataloader:
print(f'Shape of image batch: {data["image"].shape}')
print(f'Len of labels in batch: {len(data["label"])}')
# This shows the ground truth label for each image in the batch
print(f'Example labels: {data["label"]}')
# We exit the loop after getting the stats from 1 batch
break
# Now let us display the images and labels from the first batch
for batch_num, data in enumerate(sign_lang_dataloader):
fig, axes = plt.subplots(1, batch_size, figsize=(20,4))
for i, ax in enumerate(axes.flatten()):
ax.imshow(data["image"][i,0], cmap="bone")
ax.set_title(f"Label: {data['label'][i]} \n shape: {list(data['image'][i].shape)}")
break
Number of batches: 151
Shape of image batch: torch.Size([64, 1, 128, 128])
Len of labels in batch: 64
Example labels: tensor([19, 21, 35, 10, 9, 3, 18, 0, 34, 30, 4, 9, 8, 4, 19, 17, 35, 12,
31, 17, 22, 32, 25, 12, 24, 21, 6, 8, 11, 11, 31, 4, 16, 21, 21, 25,
16, 28, 25, 34, 17, 27, 6, 4, 31, 0, 6, 10, 25, 34, 28, 12, 13, 18,
21, 21, 30, 25, 13, 18, 4, 31, 28, 33])
In [36]:
labels = Variable(data['label'].long())
print(labels[:,])
tensor([12, 35, 6, 20, 35, 21, 34, 21])
Prediction Stub
You will need to provide a function that can be used to make predictions using your final trained model.
IMPORTANT
The name of your prediction function must be leader_board_predict_fn
Your prediction function should be able take as input a 4-D numpy array of shape [batch_size,1,128,128] and produce predictions in the form of a 1-D numpy array of shape [batch_size,].
Predictions for each image should be an integer in the range 0-35, that is 0 for the digit 0
0
, 1 for the digit 1
1
, .... , 9 for the digit 9
9
, 10 for the letter 𝑎
a
, 11 for the letter 𝑏
b
, ..., 35 for the letter 𝑧
z
.
Your prediction function should internally load your trained model and take care of any data transformations that you need.
Below we provide an implementation of the leader_board_predict_fn function, in which we show how a trained model can be loaded (from the weights saved on the disk) for making predictions. This example is for PyTorch, but you are free to use any framework of your choice for your model. The only requirement is that this function should accept a numpy array (with the proper shape) as the input and should produce a numpy array (with the proper shape) as the output. What you do internally is up to you.
Note that the model that we load here is not properly trained and so its performance is very bad. This example is only for showing you how a model can be loaded in PyTorch and how predictions can be made.
In [107]:
# Set dataset path
dataset_path = DATASET_PATH
# Create a Dataset object
sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)
# Create a Dataloader
sign_lang_dataloader = DataLoader(sign_lang_dataset,
batch_size=64,
shuffle=True,
drop_last=True,
num_workers=0)
In [94]:
Out[94]:
{'image': array([[[41, 41, 41, ..., 41, 41, 41],
[41, 41, 41, ..., 41, 41, 41],
[41, 41, 41, ..., 41, 41, 41],
...,
[41, 41, 41, ..., 41, 41, 41],
[41, 41, 41, ..., 41, 41, 41],
[41, 41, 41, ..., 41, 41, 41]]], dtype=uint8),
'label': 21}
In [99]:
def run_gradient_descent(model, train_data, batch_size=64, learning_rate=0.01, weight_decay=0, num_epochs=10):
model.train()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
train_loader = sign_lang_dataloader
with tqdm(total=num_epochs * len(train_loader)) as pbar:
for epoch in range(num_epochs):
for i, batch in enumerate(train_loader):
pbar.update(1)
xs,ts = batch["image"], batch["label"]
xs,ts = xs.float(), ts.long()
if len(ts) != batch_size:
continue
xs = xs.view(-1, 128*128)
zs = model(xs)
loss = criterion(zs, ts)
loss.backward()
optimizer.step()
optimizer.zero_grad()
return model
In [114]:
network = nn.Linear(128*128, 36).float()
network = run_gradient_descent(network, sign_lang_dataset)
100%|██████████| 1510/1510 [03:53<00:00, 6.47it/s]
In [123]:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [123], in <cell line: 2>()
3 x = sample["image"].numpy()
4 y = sample["label"].numpy()
----> 5 prediction = leader_board_predict_fn(x)
6 accuracies.append(accuracy_score(y, prediction, normalize=True))
Input In [108], in leader_board_predict_fn(input_batch)
28 input_batch = torch.from_numpy(input_batch).float()
30 # A forward pass with the input batch produces a batch of logits
31 # In the network that we use here, Softmax is not applied to the output
32 # This may be different for your network.
---> 33 logits = network(input_batch)
35 # Final classification predictions are taken by taking an argmax over the logits
36 # The prediction is converted to a numpy array
37 prediction = torch.argmax(logits, dim=1).numpy()
File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/linear.py:103, in Linear.forward(self, input)
102 def forward(self, input: Tensor) -> Tensor:
--> 103 return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (8192x128 and 16384x36)
In [108]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm
def leader_board_predict_fn(input_batch):
"""
Function for making predictions using your trained model.
Args:
input_batch (numpy array): Input images (4D array of shape
[batch_size, 1, 128, 128])
Returns:
output (numpy array): Predictions of the your trained model
(1D array of int (0-35) of shape [batch_size, ])
"""
prediction = None
batch_size, channels, height, width = input_batch.shape
# Set the network to evaluation mode
network.eval()
# VERY IMPORTANT
# Convert the input batch to a torch Tensor and set
# the data type to the same type as the network
input_batch = torch.from_numpy(input_batch).float()
# A forward pass with the input batch produces a batch of logits
# In the network that we use here, Softmax is not applied to the output
# This may be different for your network.
logits = network(input_batch)
# Final classification predictions are taken by taking an argmax over the logits
# The prediction is converted to a numpy array
prediction = torch.argmax(logits, dim=1).numpy()
assert prediction is not None, "Prediction cannot be None"
assert isinstance(prediction, np.ndarray), "Prediction must be a numpy array"
return prediction
Evaluation
Your final model will be evaluated on a hidden test set containing images similar to the dataset that you are provided with.
For evaluating the performance of your model, we will use the normalized accuracy_score metric from sklearn. This is simply the percentage of correct predictions that your model makes for all the images of the hidden test set. Hence, if all the predictions are correct, the score is 1.0 and if all predictions are incorrect, the score is 0.0. We will use the sklearn metric so that the accuracy function is agnostic to the machine learning framework you use.
In [109]:
from sklearn.metrics import accuracy_score
def accuracy(dataset_path, max_batches=30):
"""
Calculates the average prediction accuracy.
IMPORTANT
=========
In this function, we use PyTorch only for loading the data. When your `leader_board_predict_fn`
function is called, we pass the arguments to it as numpy arrays. The output of `leader_board_predict_fn`
is also expected to be a numpy array. So, as long as your `leader_board_predict_fn` function takes
numpy arrays as input and produces numpy arrays as output (with the proper shapes), it does not
matter what framework you used for training your network or for producing your predictions.
Args:
dataset_path (str): Path of the dataset directory
Returns:
accuracy (float): Average accuracy score over all images (float in the range 0.0-1.0)
"""
# Create a Dataset object
sign_lang_dataset = SignLangDataset(csv_file="labels.csv", root_dir=dataset_path)
# Create a Dataloader
sign_lang_dataloader = DataLoader(sign_lang_dataset,
batch_size=64,
shuffle=True,
drop_last=True,
num_workers=0)
# Calculate accuracy for each batch
accuracies = list()
for batch_idx, sample in enumerate(sign_lang_dataloader):
x = sample["image"].numpy()
y = sample["label"].numpy()
prediction = leader_board_predict_fn(x)
accuracies.append(accuracy_score(y, prediction, normalize=True))
# We will consider only the first 30 batches
if batch_idx == (max_batches - 1):
break
assert len(accuracies) == max_batches
# Return the average accuracy
mean_accuracy = np.mean(accuracies)
return mean_accuracy
We will now use your leader_board_predict_fn function for calculating the accuracy of your model. We provide the code for testing your loaded model on the visible training data. We will also evaluate your model's performance on the test dataset (the test dataset should only be used for evaluation and is NOT to be used for training your model).
In [124]:
def get_score():
"""
Function to compute scores for train and test datasets.
"""
import torch
import numpy as np
from sklearn.metrics import accuracy_score
import os
import pwd
import time
import pathlib
import pandas as pd
import datetime
### LEADER BOARD TEST
seed = 200
torch.manual_seed(seed)
np.random.seed(seed)
# Calculate the accuracy on the training dataset
# to check that your `leader_board_predict_fn` function
# works without any error
dataset_score = accuracy(dataset_path=DATASET_PATH)
assert isinstance(dataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
assert 0.0<=dataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"
# This is your accuracy score on the visible training dataset
# This is NOT used for the leaderboard.
print(f"Accuracy score on training data: {dataset_score}")
# There is a hidden test that will evaluate your trained model on the hidden test set
# This hidden dataset and the accuracy for this will not be visible to you when you
# validate this notebook. The accuracy score on the hidden dataset will be used
# for calculating your leaderboard score.
seed = 200
torch.manual_seed(seed)
np.random.seed(seed)
user_id = pwd.getpwuid( os.getuid() ).pw_name
curtime = time.time()
dt_now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
try: #
HIDDEN_DATASET_PATH = os.path.expanduser("/data/mlproject22-test-data/sign_lang_test")
hiddendataset_score = accuracy(dataset_path=HIDDEN_DATASET_PATH)
assert isinstance(hiddendataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
assert 0.0<=hiddendataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"
print(f"Leaderboard score: {hiddendataset_score}")
score_dict = dict(
score_hidden=hiddendataset_score,
score_train=dataset_score,
unixtime=curtime,
user=user_id,
dt=dt_now,
comment="",
)
except Exception as e:
err = str(e)
score_dict = dict(
score_hidden=float("nan"),
score_train=dataset_score,
unixtime=curtime,
user=user_id,
dt=dt_now,
comment=err
)
#if list(pathlib.Path(os.getcwd()).parents)[0].name == 'source':
# print("we are in the source directory... replacing values.")
# print(pd.DataFrame([score_dict]))
# score_dict["score_hidden"] = -1
# score_dict["score_train"] = -1
# print("new values:")
# print(pd.DataFrame([score_dict]))
pd.DataFrame([score_dict]).to_csv("sign_lang.csv", index=False)
### LEADER BOARD TEST
get_score()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [124], in <cell line: 89>()
85 pd.DataFrame([score_dict]).to_csv("sign_lang.csv", index=False)
87 ### LEADER BOARD TEST
---> 89 get_score()
Input In [124], in get_score()
19 np.random.seed(seed)
21 # Calculate the accuracy on the training dataset
22 # to check that your `leader_board_predict_fn` function
23 # works without any error
---> 24 dataset_score = accuracy(dataset_path=DATASET_PATH)
26 assert isinstance(dataset_score, float), f"type of dataset_score is {type(dataset_score)}, but it must be float"
27 assert 0.0<=dataset_score<=1.0, f"Value of dataset_score is {dataset_score}, but it must be between 0.0 and 1.0"
Input In [109], in accuracy(dataset_path, max_batches)
35 x = sample["image"].numpy()
36 y = sample["label"].numpy()
---> 37 prediction = leader_board_predict_fn(x)
38 accuracies.append(accuracy_score(y, prediction, normalize=True))
40 # We will consider only the first 30 batches
Input In [108], in leader_board_predict_fn(input_batch)
28 input_batch = torch.from_numpy(input_batch).float()
30 # A forward pass with the input batch produces a batch of logits
31 # In the network that we use here, Softmax is not applied to the output
32 # This may be different for your network.
---> 33 logits = network(input_batch)
35 # Final classification predictions are taken by taking an argmax over the logits
36 # The prediction is converted to a numpy array
37 prediction = torch.argmax(logits, dim=1).numpy()
File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File /opt/conda/lib/python3.9/site-packages/torch/nn/modules/linear.py:103, in Linear.forward(self, input)
102 def forward(self, input: Tensor) -> Tensor:
--> 103 return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (8192x128 and 16384x36)```