NA labels in EMNIST dataset

Hi , I am running the below code for loading EMNIST letters data. I am getting N/A labels as a extra class after running ‘dataset.classes’. Why these N/A labels are coming?

torch version-2.1.0
tochvision version- 0.16.0

import torch
import torchvision
from torchvision.datasets import EMNIST
dataset = EMNIST(root=‘data/’, split=‘letters’, download=True)
dataset.classes

In the source code the classes for EMNIST are defined as:

classes_split_dict = {
        "byclass": sorted(list(_all_classes)),
        "bymerge": sorted(list(_all_classes - _merged_classes)),
        "balanced": sorted(list(_all_classes - _merged_classes)),
        "letters": ["N/A"] + list(string.ascii_lowercase),
        "digits": list(string.digits),
        "mnist": list(string.digits),
 }

This raises the question of why the letters need “N/A” class. I thought maybe some targets did not fall in the range of ‘a-z’

So I checked the targets.

import torch
import torchvision
from torchvision.datasets import EMNIST
import collections

dataset = EMNIST(root='data', split='letters', transform=transforms.ToTensor(), download=True)
data_classes = collections.defaultdict(int)
for i in range(len(dataset)):
  data_classes[chr(ord('a') - 1 + dataset.targets[i].item())] += 1
print([(x, data_classes[x]) for x in sorted(data_classes.keys())])

The output was 4800 images for each letter, so each image contained a valid target (with no N/A). I am not sure of the purpose of the N/A since all images fall within a valid target class. It’s possible that it can be removed from the classes. However, it shouldn’t affect the usability of the dataset.