Dataloader using pandas.read_csv

I have data files containing 5,184 (18x18x16) entries. Each file contains data from a different patient. My current dataloader setup creates a dataset of length 40 (the number of patients). Instead of treating each patient’s data separately, I want to combine them all so that I can randomize them. There should be just over 207,000 spectra in the final dataset.
If I could save the data in a text file, then I could use readlines(). However, the dicom data goes out 6 decimal places and the only option with Matlab to maintain this precision is to export the data into .csv files. Using the pd.read_csv seems to load the whole batch of data as one input instead of the 5,000 it consists of.
Is there a way to separate the data line by line so that when my network calls “for i, data in enumerate(dataset):” the ‘i’ will be the 200+ thoudand individual spectra instead of the 40 different patients?
Here is what’s called to actually load the data:

import os.path
from pandas import read_csv
import torch
from data.base_dataset import BaseDataset
from data.image_folder import make_dataset

class AlignedLabeledSpectralDatasetModified(BaseDataset):
    def initialize(self, opt):
        self.opt = opt
        self.root = opt.dataroot
        self.dir_AB = os.path.join(opt.dataroot, opt.phase)
        self.AB_paths = sorted(make_dataset(self.root)) # Returns a list of paths of the files in the dataset

    def __getitem__(self, index):
        AB_path = self.AB_paths[index]
        with open(AB_path,"r") as file:
            AB_spectra = read_csv(file, names=['Spectra','NAA','cho'])

        A = AB_spectra.Spectra
        #B = [AB_spectra.NAA, AB_spectra.cho]

        return {'A_data': A, 'A_paths': AB_path}

    def __len__(self):
        return len(self.AB_paths)

    def name(self):
        return 'SingleSpectralDataset'

I think your problem statement is a little confusing. Couldn’t you just make a sorting by patient in MATLAB and create a new .csv file from the sorted data with 207k entries?

A word of advise however is to do all pandas-loading related work in the init-method of your Dataset as this will make it run much, much faster from my experience.