Data Loader is not working

I have multiple csv files which contain 1D data and I want to use each row. Each file contains different number of rows. So I have written a dataloader like this:

class data_gen(torch.utils.data.Dataset):
    def __init__(self, files):
        
        self.files = files
        print("FILES: ", type(self.files))
        
    def __getitem__(self, i):

        print("GETite,")
        
        file1 = self.files[i]
        print("FILE1: ", file1)
        my_data = np.genfromtxt('/data/'+file1, delimiter=',')
        
        # file1 = np.reshape(file1,(1,len(file1)))
        # file1 = torch.from_numpy(file1).float()
        
        # return data
        print(len(my_data))
        return my_data

    def __len__(self): 
        
        return len(self.files)

However, when I call it like this:

train_dl_spec = data_gen(train_files[0])

I get the following output:

FILES:  <class 'str'>

It’s not processing __getitem__ for some reason. What could be the reason?

You’re likely better off concatenating those CSV files prior to initializing the dataset object.

Handle all of that outside of the Dataset.

I tried that. But the file becomes so big that it doesn’t load on RAM then.

I’m not sure I understand the issue correctly.
The mentioned output is created in the __init__, so it seems the Dataset is initialized properly.
What kind of issue are you seeing when calling train_dl_spec[0]?

You are passing a train_files[0] that is a string to self.files. No sure if I fully understand your question. I guess you want data_gen(train_files)[0] to invoke __getitem__ from data_gen instance.

I think I was able to solve it. I first restarted the kernel and then, I edited the class:

class data_gen(torch.utils.data.Dataset):
    def __init__(self, files):
        
        self.files = files
        my_data = np.genfromtxt('/data/'+files, delimiter=',')
        self.dim = my_data.shape[1]
        self.data = []
        
    def __getitem__(self, i):

        file1 = self.files
        my_data = np.genfromtxt('/data/'+file1, delimiter=',')
        self.dim = my_data.shape[1]

        for j in range(my_data.shape[1]):
            tmp = np.reshape(my_data[:,j],(1,my_data.shape[0]))
            tmp = torch.from_numpy(tmp).float()
            self.data.append(tmp)        
        
        return self.data[i]

    def __len__(self): 
        
        return self.dim

And now, it’s working when I call
train_loader = torch.utils.data.DataLoader( train_dl_spec, batch_size=128, shuffle=True, num_workers=8, pin_memory=True)