The problem: I have images that I’ve loaded and then stored to numpy arrays. The dataset is quite big so I realized I have to split it into different files where I can load one at a time. I’ve tried to create my own dataset class as follows
class my_Dataset(Dataset): # Characterizes a dataset for PyTorch def __init__(self, folder_dataset, transform=None): # xs, ys will be name of the files of the data self.xs =  self.ys =  self.transform = transform # Open and load text file including the whole training data with open(folder_dataset + 'data.txt') as f: for line in f: self.xs.append(folder_dataset+line.split()) self.xs.append(folder_dataset + line.split()) # pick a random of these (sub)files of the dataset file_ID = np.random.randint(1, len(self.xs)) numpy_data = np.load('x_imgs_ID_' + str(file_ID) + '.npy') numpy_data = np.moveaxis(numpy_data, , ) numpy_target = np.load('y_imgs_ID' + str(file_ID) + '.npy') # make numpy arrays to tensors self.data = torch.from_numpy(numpy_data).float() self.target = torch.from_numpy(numpy_target).long() def __len__(self): return len(self.data) def __getitem__(self, index): # Generates one sample of data, done with pytorch's own index? single_x = self.data[index] single_y = self.target[index] return single_x,single_y
I’m new to PyTorch and deeplearning in general so I’m trying to learn. Am I doing it correctly in this dataset class? Specifically the np.random part for me feels super weird. I’ve read a bunch of posts and tutorials on trying to understand how this works but this still feels difficult for me to grasp.
Anyone that can give me some feedback on this is very much appreciated!