Large performance difference when using torchvision EMNIST dataset vs custom dataset

Hi! I am training a classification model on the EMNIST dataset. I noticed big differences in the convergence of the loss when using the EMNIST dataset from torchvision (torchvision.datasets.EMNIST), instead of loading from CSV. I tested using the same parameters for the model and with the same random seed.

I constructed a custom dataset class to fetch data from csv files:

def load_csv_data(csv_path:str):
 #function to load data from csv, used to create img_data & label_data
    data_df = pd.read_csv(csv_path)
    labels = (data_df.iloc[:,0]).values #first column, labels are uppercase & lowecase, e.g "a" or "A" = 1, shift range from 1-26 to 0-25
    raw_pixels = data_df.iloc[:,1:].values #pixel values , transpose to get correct orientation

    imgs = raw_pixels.reshape(-1,28,28) #emnist images are 28*28 pixels - 3d ndarray
    return imgs, labels

class EMNISTDataset(Dataset):
    def __init__(self, img_data:ndarray, label_data:ndarray, transform, target_transform, subset_indices:slice,):
        super().__init__()
        self.img_subset = img_data[subset_indices]
        self.label_subset = label_data[subset_indices]

        if self.img_subset.shape[0] != self.label_subset.shape[0]:
              raise RuntimeError(f"Img subset len does not match label subset len: \n Img subset len: {self.img_subset.shape[0]} \n Label subset len:{self.label_subset.shape[0]}")
        self.transform = transform
        self.target_transform = target_transform
  
     def __len__(self):
        data_len = self.label_subset.shape[0]
        return data_len
     
     def __getitem__(self, idx):
        X = self.img_subset[idx,:,:]
        y = int(self.label_subset[idx])
        if self.transform:
            X = self.transform(X) #applies torchvision transform
        if self.target_transform:
              y = self.target_transform(y) #applies torchvision transform
          return X, y

I am using these transforms:

img_transform = v2.Compose([
    v2.Resize(28),
    v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)]), # to tensor
    v2.RandomHorizontalFlip(p=1),#100% probability
    v2.RandomRotation(degrees=(90,90)), #flip 90 degrees
    ])
target_transform = lambda y:y-1

This is the Crossentropyloss from multiple runs, the worse ones (around 3.2) are from my custom dataset, while the better ones are from the torchvision dataset:

Compare a batch created by the torchvision dataset with one created by your custom dataset w.r.t. its stats (e.g. mean, min, max, std) to see if the processing differs.

I compared the batches and found that there were big differences, e.g. max: 0.33 vs 3e-18. I fixed it by converting the image arrays to PIL images.